CN109582933A - A kind of method and relevant apparatus of determining text novelty degree - Google Patents

A kind of method and relevant apparatus of determining text novelty degree Download PDF

Info

Publication number
CN109582933A
CN109582933A CN201811348626.6A CN201811348626A CN109582933A CN 109582933 A CN109582933 A CN 109582933A CN 201811348626 A CN201811348626 A CN 201811348626A CN 109582933 A CN109582933 A CN 109582933A
Authority
CN
China
Prior art keywords
candidate
entity
target
text
relationship
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811348626.6A
Other languages
Chinese (zh)
Other versions
CN109582933B (en
Inventor
陈伟然
姜庭欣
杨冠梅
段博超
郭永红
何佳
王志强
王希桢
李静毅
刘乾楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Enjoy Wisdom Technology Co Ltd
Original Assignee
Beijing Enjoy Wisdom Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Enjoy Wisdom Technology Co Ltd filed Critical Beijing Enjoy Wisdom Technology Co Ltd
Priority to CN201811348626.6A priority Critical patent/CN109582933B/en
Publication of CN109582933A publication Critical patent/CN109582933A/en
Application granted granted Critical
Publication of CN109582933B publication Critical patent/CN109582933B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the present application provides the method and relevant apparatus of a kind of determining text novelty degree, this method comprises: determining target text;Multiple target entities in the target text are extracted, target entity set is obtained;Obtain the candidate entity sets of every candidate text in candidate text collection;Determine that the first instance intersection of the target entity set and the candidate entity sets, the first instance intersection are the entity to match in the target entity set and the candidate entity sets;The novel degree of the target text and the candidate text is determined according to the difference parameter of the first instance intersection and the target entity set.In the embodiment of the present application, the accuracy rate that novel degree calculates is improved.

Description

A kind of method and relevant apparatus of determining text novelty degree
Technical field
The present invention relates to data processing fields, and in particular to a kind of method and relevant apparatus of determining text novelty degree.
Background technique
With the arriving in technology explosion epoch, information importance constantly enhances, and data volume constantly increases, and information retrieval is with regard to outstanding It is important.
User usually needs to inquire time similar with the target text in the database according to target text searching database Selection sheet, but current search method is all based on greatly text retrieval, text retrieval is conceived to the matching of text character.For example, User determines the keyword in target text, inputs keyword, and then searching system is according to the candidate in keyword and database Text carries out Keywords matching, and the novel degree of the higher candidate text and target text of keyword quantity Matching is lower.
In current mode, user is needed to determine keyword, the selection of keyword is very big on search result influence, and The selection of keyword has subjectivity, and the not necessarily understanding of target text actual content, therefore, target text and candidate are literary The accuracy rate of this novel degree is lower.
Summary of the invention
In view of this, being determined the embodiment of the invention provides the method and relevant apparatus of a kind of determination text novelty degree Candidate entity all in the candidate text of all target entities and every in target text, according to first instance intersection and institute The difference parameter for stating target entity set determines the novel degree of the target text and the candidate text, relative to existing skill Art, the keyword only determined by user's subjectivity determine novelty degree by Keywords matching, the determination method of novel degree need by To the influence of user's subjective understanding, method provided by the embodiments of the present application is more objective, is that target text and candidate text are really interior The expression of appearance, therefore, novel degree calculate more acurrate.
In a first aspect, the embodiment of the present application provides a kind of method of determining text novelty degree, comprising:
Determine target text;
Multiple target entities in the target text are extracted, target entity set is obtained;
Obtain the candidate entity sets of every candidate text in candidate text collection;
Determine the first instance intersection of the target entity set and the candidate entity sets, the first instance intersection For the entity to match in the target entity set and the candidate entity sets;
The target text and institute are determined according to the difference parameter of the first instance intersection and the target entity set State the novel degree of candidate text.
In one possible implementation, the method also includes:
Multiple binary crelations in the target text are extracted, target binary crelation set, the binary crelation packet are obtained Include two entities and its between relationship;
Obtain the candidate binary set of relationship including multiple binary crelations in the candidate text;
Determine the first binary crelation intersection of the target binary crelation set Yu the candidate binary set of relationship, it is described First binary crelation intersection includes the binary to match in the target binary crelation set and the candidate binary set of relationship Relationship;
The difference parameter according to the first instance set and the target entity set determines the target text With the novel degree of the candidate text, comprising:
The difference parameter according to the first instance set and the target entity set determines first instance novelty Degree;
The first binary is determined according to the difference parameter of the first binary crelation intersection and the target binary crelation set Relationship novelty degree;
According to the first instance novelty degree and the first binary crelation novelty degree determine the target text with it is described The novel degree of candidate text.
In one possible implementation, the method also includes:
The target ternary relation set in the target text is extracted, the target ternary relation set includes multiple ternarys Relationship, the ternary relation include two binary crelations, entity having the same in described two binary crelations;
Obtain the candidate ternary relation set including multiple ternary relations in the candidate text;
Determine the first ternary relation intersection of the target ternary relation set and the candidate ternary relation set, it is described First ternary relation intersection includes the ternary to match in the target ternary relation set and the candidate ternary relation set Relationship;
It is described according to the first instance novelty degree and the first binary crelation novelty degree determine the target text with The novel degree of candidate's text, comprising:
The first ternary is determined according to the difference parameter of the first ternary relation intersection and the target ternary relation set Relationship novelty degree;
According to the first instance novelty degree, the first binary crelation novelty degree and the first ternary relation novelty degree Determine the novel degree of the target text and the candidate text.
In one possible implementation, the multiple target entities extracted in the target text, comprising:
The target text is input to entity extraction model, the target text is identified by the entity extraction model In multiple target entities.
In one possible implementation, the multiple binary crelations extracted in the target text, comprising:
The target text for having recognized the target entity is input to relationship and extracts model, mould is extracted by the relationship Type extracts the binary crelation between the target entity.
In one possible implementation, comprising:
According to the relationship between the target entity, structured representation is carried out to the target text, generates object construction.
In one possible implementation, including node and side, the node are described for indicating the target entity Side is used to indicate the relationship between target entity.
In one possible implementation, every candidate text is that the candidate of structuring is tied in the candidate text collection Structure, the target text are object construction, the method also includes:
The candidate entity sets of candidate map are extracted, candidate's map includes an at least candidate structure;
The entity intersection of the determination target entity set and the candidate entity sets, comprising:
Determine the second instance intersection of the candidate entity sets of the target entity set and the candidate map;
The method also includes:
The target text and institute are determined according to the difference parameter of the second instance set and the target entity set State the novel degree of candidate map.
In one possible implementation, when the candidate map includes at least two candidate structures, it is described at least Two candidate structures are the first candidate structure and the second candidate structure;
Determine the associated entity of first candidate structure and second candidate structure;
First candidate structure and second candidate structure are associated by the associated entity, obtained described Candidate map.
In one possible implementation, the method also includes:
Multiple binary crelations in the object construction are extracted, target binary crelation set is obtained;
Two target entities that each target binary crelation in the target binary crelation set is included are navigated to Corresponding two provider locations in candidate's map;
Calculate the distance between corresponding described two provider locations of each target binary crelation;
Second binary crelation of each target binary crelation relative to the candidate map is determined according to the distance Novel degree;
The difference parameter according to the second instance set and the target entity set determines the target text With the novel degree of the candidate text, comprising:
Second instance novelty degree is determined according to the difference parameter of the second instance set and the target entity set;
According to the second instance novelty degree and the second binary crelation novelty degree determine the object construction with it is described The novel degree of candidate map.
In one possible implementation, the method also includes:
Obtain the candidate binary set of relationship including multiple binary crelations in the candidate map;
Determine the second binary crelation intersection of the target binary crelation set Yu the candidate binary set of relationship;
The first binary is determined according to the difference parameter of the second binary crelation intersection and the target binary crelation set Relationship novelty degree;
It is true according to the first binary crelation novelty degree and the second binary crelation novelty degree and corresponding weight Determine binary crelation novelty degree;
It is described according to the second instance novelty degree and the second binary crelation novelty degree determine the object construction with The novel degree of the candidate structure, comprising:
The object construction and the candidate are determined according to the second instance novelty degree and the binary crelation novelty degree The novel degree of map.
In one possible implementation, the method also includes:
Multiple ternary relations in the object construction are extracted, target ternary relation set is obtained;
Any two target entity that each target ternary relation in the target ternary set is included is navigated to Three provider locations of correspondence in candidate's map;
Calculate the distance between any two provider locations in three provider locations;
Second ternary relation of each target ternary relation relative to the candidate map is determined according to the distance Novel degree;
It is described according to the second instance novelty degree and the second binary crelation novelty degree determine the object construction with The novel degree of the candidate structure, comprising:
According to the determination of the second instance novelty degree, the second binary crelation novelty degree and the second ternary relation novelty degree The novel degree of object construction and the candidate map.
In one possible implementation, the method also includes:
Obtain the candidate ternary relation set including multiple ternary relations in the candidate map;
Determine the second ternary relation intersection of the target ternary relation set and the candidate ternary relation set;
The first ternary relation is determined according to the difference parameter of the ternary relation intersection and the target ternary relation set Novel degree;
It is true according to the first ternary relation novelty degree and the second ternary relation novelty degree and corresponding weight Determine ternary relation novelty degree;
The object construction and the candidate are determined according to the second instance novelty degree and the binary crelation novelty degree The novel degree of map, comprising:
The target is determined according to the second instance novelty degree, binary crelation novelty degree and the ternary relation novelty degree The novel degree of structure and the candidate map.
In one possible implementation, the multiple binary crelations extracted in the target text, comprising:
Entity relationship data set is obtained, the entity relationship data set is according between the entity and entity in text collection Relationship obtain;The entity relationship matrix includes the relationship between N number of entity and N number of entity, the N be greater than or Equal to 2;
It is inquired in the entity relationship data set, obtains having with the first instance related M second in fact Body, the M are less than or equal to N;
In the presetting range in the target text, the second instance is searched;
If finding at least one target second instance in the M second instance, establish the first instance with Relationship between the target second instance.
In one possible implementation, in the presetting range in the target text, described second is searched Before entity, the method also includes:
Create Entities Matching window;
The presetting range in the target text is determined according to the size of the Entities Matching window.
Second aspect provides a kind of device of determining text novelty degree in the embodiment of the present application, comprising:
First determining module, for determining target text;
Extraction module, for extracting multiple target entities in the target text that first determining module determines, Obtain target entity set;
Module is obtained, for obtaining the candidate entity sets of every in candidate text collection candidate text;
Second determining module, for determining the target entity set and the acquisition module of the extraction module identification The first instance intersection of the candidate entity sets obtained, the first instance intersection are the target entity set and described The entity to match in candidate entity sets;
Novelty determining module, the first instance intersection for being determined according to second determining module are mentioned with described The difference parameter for the target entity set that modulus block extracts determines the novel degree of the target text and the candidate text.
The third aspect, the embodiment of the present application provide a kind of electronic equipment, comprising:
Memory and processor;
Connection is communicated with each other between the memory and the processor, is stored with computer instruction in the memory, The processor is by executing the computer instruction, thereby executing method described in above-mentioned first aspect.
Fourth aspect, the embodiment of the present application provide a kind of computer storage medium, which is characterized in that the computer can It reads storage medium and is stored with computer instruction, the computer instruction is for executing the computer described in above-mentioned first aspect Method.
In the present embodiment, it is first determined need the target text of novelty degree to be determined, which can be for one specially Benefit;Multiple target entities in the target text are further extracted, target entity set is obtained;Obtain candidate text collection In every candidate text candidate entity sets;Traverse each candidate text, determine the target entity set with it is each The first instance intersection of the candidate entity sets of the candidate text of a piece, first instance intersection are the target entity set and the time Select the entity to match in entity sets;Finally, being joined according to the difference of the first instance intersection and the target entity set Number determines the novel degree of the target text and the candidate text.In the present embodiment, it is contemplated that all mesh in target text Candidate entity all in entity and every candidate text is marked, according to the difference of first instance intersection and the target entity set Different parameter determines the novel degree of the target text and the candidate text, compared with the existing technology, only subjective really by user Fixed keyword determines novelty degree by Keywords matching, and the determination method of novel degree needs the shadow by user's subjective understanding It rings, it is the expression of target text and candidate text true content that method provided by the embodiments of the present application is more objective, therefore, novel Degree calculates more acurrate.
Detailed description of the invention
The features and advantages of the present invention will be more clearly understood by referring to the accompanying drawings, and attached drawing is schematically without that should manage Solution is carries out any restrictions to the present invention, in the accompanying drawings:
Fig. 1 is that a kind of step process of one embodiment of the method for training structure model of the embodiment of the present application is illustrated Figure;
Fig. 2 is a kind of step flow diagram of one embodiment of the method for text structure of the embodiment of the present application;
Fig. 3 is the schematic diagram of the object construction in the embodiment of the present application;
Fig. 4 is the schematic diagram of the picture structure in the embodiment of the present application;
Fig. 5 is a kind of step process signal of one embodiment of the method for determining text similarity in the embodiment of the present application Figure;
Fig. 6 is the Word2vec model training process schematic in the embodiment of the present application;
Fig. 7 is that the step process of the one embodiment for the method that one of the embodiment of the present application determines text novelty degree is shown It is intended to;
Fig. 8 is the schematic diagram of the candidate map in the embodiment of the present application;
Fig. 9 is the step flow diagram that one embodiment of method of image information is obtained in the embodiment of the present application;
Figure 10 is the schematic diagram of Detailed description of the invention and attached drawing in candidate text in the embodiment of the present application;
Figure 11 is the topological schematic diagram of the first candidate image and the second candidate image in the embodiment of the present application;
Figure 12 is a kind of step process signal of the one embodiment for the method for obtaining entity information in the embodiment of the present application Figure;
Figure 13 is a kind of structural schematic diagram of one embodiment of the device of determining text novelty degree in the embodiment of the present application;
Figure 14 is a kind of structural representation of another embodiment of the device of determining text novelty degree in the embodiment of the present application Figure;
Figure 15 is the structural schematic diagram of one embodiment of a kind of electronic equipment in the embodiment of the present application.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.
The embodiment of the present application provides a kind of method of text structure, and the text in the embodiment of the present application includes but unlimited Due to technical literature, patent document, academic paper etc., after carrying out structured representation to text, obtain structuring information (such as Structure chart) facilitate understanding of the user to text content.Alternatively, the information of structuring can also be used as retrieving information Retrieval type is all based on greatly text retrieval currently for the search method of patent, text retrieval is conceived to by taking patent document as an example The matching of text character lacks the understanding to user demand and the understanding of patent content, not on the basis of content understanding into Row retrieval.And patent text is indicated by the method providing in the embodiment of the present application by way of structuring, Ke Yi Patent content is retrieved on the basis of understanding, improves the accuracy of retrieval.
A kind of method of text structure is provided in the embodiment of the present application, this method is applied to a kind of electronic equipment, should Electronic equipment can be server, or terminal device, the terminal device include but is not limited to computer, mobile phone, palm Computer etc..The electronic equipment obtains the target text to structuring, for example, the target text can be a patent, then The target text is input to trained entity extraction model, the target text is identified by the entity extraction model In entity;Then the target text for having recognized entity is input to trained relationship and extracts model, pass through the relationship Extract the relationship between entity described in model extraction;According to the relationship between the entity and the entity, to the target text This progress structured representation generates the information (or text representation for structuring) of structuring, for example, the text of the structuring Expression can be structure chart or flow chart etc..In the embodiment of the present application, pass through trained entity extraction model extraction Entity in target text, then the relationship between entity described in model extraction is extracted by trained relationship, according to entity And the relationship between entity is automatically generated the text representation of structuring, conducive to the understanding to content of text, conversion speed is fast, section Less manpower cost.
The method of the text structure provided in the embodiment of the present application understands, for convenience first to the embodiment of the present application The word of middle offer is explained:
Entity: such in such as patent, paper for indicating for indicating the word of feature in text (such as patent, paper) In technical literature, which is the word for presentation technology feature, and entity includes component, attribute or attribute value.
Component: the building block in text, such as charging equipment, memory are indicated.
Attribute: an attribute of component, such as " voltage " of charging equipment are indicated.
Attribute value: the value of one attribute of component is indicated, for example the voltage of charging equipment is " 240v ".
Relationship between entity: the relationship between technical characteristic, specifically, include the relationship between the component, it is described Relationship between component and the attribute, or, the relationship of the attribute and the attribute value.
Wherein, 1) type of the relationship between component includes but is not limited to:
Inclusion relation, citing, charging pile include control unit.
Connection relationship, citing, humidity control apparatus connect refrigerating fan.
2) relationship of component and attribute:
Component has certain attribute, for example charging equipment has voltage properties.
3) relationship of the attribute of component and attribute value:
Attribute has specific attribute value, such as voltage "Yes" 240v.
Embodiment 1
Understood incorporated by reference to Fig. 1, the method for the text structure provided in the embodiment of the present application is carried out below detailed Illustrate, the method for text structuring mainly includes two parts, and first part is training structure model, the second part For text is carried out structured representation.
First, training structure model;
The structural model includes entity extraction model for extracting entity and for extracting the pass between the entity System extract model, trained method the following steps are included:
The first corpus set that step 101, acquisition have marked, the first corpus set is according to the first presetting rule pair Every text in first text collection carries out what entity corpus labeling obtained.
First text collection includes but is not limited to technical literature, patent, academic paper etc., is somebody's turn to do in the embodiment of the present application First text collection is illustrated by taking patent as an example.For example, first text collection may include 10,000 patents, need to illustrate , the quantity for the patent for including in first text collection only illustrate and and it is non-limiting.
The first corpus set is to carry out entity language to every text in the first text collection according to the first presetting rule Material mark obtains.First presetting rule are as follows: will indicate the first vocabulary of the entity and indicate the second vocabulary of non-physical Distinguish mark.
Specifically, being illustrated by taking the partial content in a patent in the first text collection as an example:
A kind of text are as follows: " High-Position Automotive Brake Lights, it is characterised in that: the installation seat plate (1) including rectangle, the installation Seat board (1) is equipped with matched shell frame (2), is equipped with multiple partitions (3) in the shell frame (2) ", for above-mentioned text This, marks corpus into following format:
" a kind of High-Position Automotive Brake Lights, including rectangle /pre peace/start dress/in/in plate/end (/after 1), Described/pre peace/start dress/in/in plate/end (/after 1) be equipped with matched/pre it is outer/start shell/ In frame/end (/after 2), described/pre is outer/start shell/in frame/end (/after 2) in be equipped with multiple/pre every/ Start plate/end (3), each/pre have one/pre axis/entity "/after every/start plate/end
Wherein, first presetting rule specifically: the first identifier (such as :/start) represents the first character of entity, the Two marks (such as :/end) represent the last character of entity, and third mark (such as :/in) proxy component is located at first identifier Word between start and second identifier end.4th mark (such as :/entity) represents only one word of this component.5th mark (such as :/pre) represents the word before first identifier start.6th mark (such as :/after) represent second identifier end it All words of the word afterwards in addition to physical name assign the 7th unified mark (such as :/w).
Such as: packet/w includes/w square/w shape/w/pre peace/start dress/in/in plate/end (/after 1/w)/w.
It should be noted that be merely illustrative in the embodiment of the present application for the mark of corpus labeling, do not cause pair The limited explanation of the embodiment of the present application.
Step 102 is trained the first corpus set, obtains entity extraction model.
Use condition random field (Conditional Random Field, CRF) model training the first corpus set, obtains To model parameter, which is constructed according to model parameter.
CRF can be labeled Chinese character, i.e., be made of word (group word) word, not only allow for the frequency of text word appearance Information, while considering context of co-text, have preferable learning ability, therefore it all has the identification of ambiguity word and unregistered word There is good effect.
Step 103, using the second text collection as the input of the entity extraction model, pass through the entity extraction model Identify the entity information in second text collection.
Second text collection is also the set of patent.Using the second text collection as the input of the entity extraction model, The entity information in second text collection is identified by the entity extraction model.
For example, the partial content of a patent in second text collection are as follows:
A kind of battery detection managing device, including battery pack (1), monitoring modular (2), CPU processor (3) and display (4), it for this section of text, is parsed, is obtained using entity extraction model:
One/w kind/w electricity/the pond w/w prison/w survey/w pipe/w reason/w dress/w sets/w, and/w packet/w includes/the w electricity/pond start/in group/ End (/w 1/w)/w ,/w prison/start survey/in mould/in block/end (/w 2/w)/w, at/w CPU/start/in reason/in device/ End (/w 3/w)/w and/w is aobvious/and start shows/in device/end (/w 4/w)/w
Four component names: battery pack, monitoring modular, CPU processor and display are extracted from the example of above-mentioned text.
The second corpus set that step 104, acquisition have marked, the second corpus set is according to the second presetting rule pair Every text of second text collection carries out relationship corpus labeling and entity marks.
After entity extraction model completes component extraction, the corpus labeling of relationship is carried out, and be converted to the corpus of CRF model Format is trained.
Second presetting rule are as follows: by the first vocabulary for indicating the entity, the third vocabulary for indicating relationship, indicate non- The entity and third vocabulary of non-relationship distinguishes mark.
Specifically, being exemplified below:
Example: the installation seat plate (1) is equipped with matched shell frame (2)
Relationship corpus standard carried out to the text, mark at:
" institute/w states/w installation seat plate/e (/w 1/w)/w is upper/w sets/r_start has/r_end and/w it/w phase/w/w With/w /w shell frame/e (/w 2/w)/w ".
Wherein, the 7th mark (such as :/w) is general character, and the 8th mark (such as :/e) is the group of entity extraction model identification Part, the beginning word of the 9th mark (such as :/r_start) representation relation, the end word of the 9th mark (such as :/r_end) representation relation.
It should be noted that entity extraction model identified is the relationship between entity and entity in the embodiment of the present application, In example in the embodiment of the present application, entity extraction model identifies that component is merely illustrative, which can also With recognition property and attribute value, do not illustrate one by one in embodiment only, therefore, example shown by the embodiment of the present application Son does not cause the limited explanation to the application.
Step 105 is trained the second corpus information set, obtains the relationship and extracts model.
The second corpus information set is trained using CRF model, model parameter is obtained, according to the model parameter structure It builds the relationship and extracts model.Wherein, which includes regularization term parameter a, value L2, can be obtained more better than L1 quasi- Close effect.Hyper-parameter parameter c can be fitted as far as possible training data with value 3.Participate in the threshold of the feature of training Value f, the f value 3 are not involved in training if the number that word occurs is less than f.
For example, extracting the relationship between entity from above-mentioned text are as follows: " dress seat board " is equipped with " shell frame ".
In the embodiment of the present application, the first corpus set marked is obtained, the first corpus set is pre- according to first It sets rule and what entity corpus labeling obtained is carried out to every text in the first text collection;Then to the first corpus set It is trained, obtains entity extraction model, which is used to extract the entity in text;Then, by the second text Gather the input as the entity extraction model, the reality in second text collection is identified by the entity extraction model Body information;Obtain the second corpus set marked;The second corpus information set is trained, the relationship is obtained and mentions Modulus type, the relationship extract model and are used to extract relationship between entity, the relationship between entity and entity be used for this paper into Row structured representation.
On the basis of the above embodiments, the entity extraction model in the embodiment of the present application includes at least two entity extractions Submodel, at least two entity extractions submodel include that first instance extracts submodel and second instance extraction submodel, It is described that the first corpus set is trained, the entity extraction model is obtained, can also be specifically included:
The first corpus set is trained, the first instance is obtained and extracts submodel;
The input that submodel is extracted using third text collection as the first instance extracts son by the first instance Model identifies the target entity set in the third text collection;
The target entity set is trained, the second instance is obtained and extracts submodel.
It in the embodiment of the present application, does not need to prepare entity dictionary in advance, starts only mark a certain amount of corpus (such as first Corpus set) training first instance extraction submodel, then extracts submodel by the first instance and identifies third text collection In target entity set, which can be used as new mark corpus again, then to the target entity set into Row training obtains second instance and extracts submodel, which extracts submodel can cover more entities again, and thus Entity dictionary is generated, by the identification of multiple entity extraction submodels, which can include more and more entities, example Such as, the entity vocabulary extracted in all patents is summarised in together, forms entity dictionary, which may include 2 column, real Body+the frequency.The frequency is the patent numbers comprising this component.For example, mounting seat, 3;Shell frame, 4.Lead in the embodiment of the present application It crosses and marks a certain amount of entity corpus, by constantly training entity extraction submodel, covered by multiple entity extraction submodels More entities are covered, the entity accuracy in identification text is greatly improved.
Similarly, it includes that at least two relationships extract submodel that the relationship in the embodiment of the present application, which extracts model, this at least two A entity extraction submodel includes that the first relationship extracts submodel and the second relationship extraction submodel, to second corpus information Set is trained, and is obtained the relationship and is extracted model, can also specifically include:
The second corpus set is trained, first relationship is obtained and extracts submodel;
The input that submodel is extracted using the 4th text collection as first relationship extracts submodule by first relationship Type identifies the relationship by objective (RBO) set in the 4th text collection;
The relationship by objective (RBO) set is trained, the second instance is obtained and extracts submodel.
It in the embodiment of the present application, does not need to prepare entity relationship dictionary in advance, starts only mark a certain amount of relationship language Expect that the first relationship of (such as the second corpus set) training extracts submodel, submodel identification the is then extracted by first relationship Relationship by objective (RBO) set in four text collections, which can be used as new mark relationship corpus again, then to this Relationship by objective (RBO) set is trained, and second relationship that obtains extracts submodel, which extracts submodel and can cover again more More relationships, and relationship dictionary is thus generated, the identification of submodel is extracted by multiple relationships, which can include more next More relationship, for example, the relationship vocabulary extracted in all patents is summarised in together, component relationship dictionary, the relationship dictionary It may include 2 column, the relationship+frequency.The frequency is the patent numbers comprising this relationship.For example, including 10;It is equipped with, 20.The application By marking a certain amount of relationship corpus in embodiment, submodel is extracted by continuous repetitive exercise relationship, passes through multiple relationships It extracts submodel and covers more relationships, the relationship accuracy in identification text is greatly improved.
Then text structure expression is carried out;
It executes the step 101- step 105 in above-mentioned example and obtains entity extraction model and relationship extraction model, further Model can be extracted by the entity extraction model and the relationship and structured representation is carried out to target text, please refer to shown in Fig. 2, The embodiment of the present application provides a kind of method of text structure, may include steps of:
The target text of step 201, acquisition to structuring.
The target text to structuring is obtained, for example, the target text can be a patent.
The target text is input to entity extraction model by step 202, by described in entity extraction model identification Target entity set in target text.The entity extraction model is trained to obtain to the first corpus set, should First corpus set is to carry out entity corpus labeling to every text in the first text collection to obtain.
Firstly, the target text is input to entity extraction model, the target is identified by the entity extraction model Target entity set in text.For example, the target text includes following content: " a kind of High-Position Automotive Brake Lights, feature exist In: the installation seat plate (1) including rectangle, the installation seat plate (1) are equipped with matched shell frame (2), the shell Multiple partitions (3) are equipped in frame (2) ", which exports the target entity collection in the target text and is combined into " mounting base Plate, shell frame, partition ".
The target text for having recognized the target entity set is input to relationship extraction model by step 203, passes through institute State the relationship between target entity described in relationship extraction model extraction.
The target text for having recognized the target entity is input to relationship and extracts model, it is each that relationship extracts model output Relationship between target entity, for example, the relationship between each entity are as follows: installation seat plate is equipped with shell frame;Shell frame be equipped with every Plate.
Step 204, according to the relationship between the entity and the target entity, structuring is carried out to the target text It indicates, generates object construction.
It please refers to Fig. 3 to be understood, Fig. 3 is the schematic diagram of object construction.The generation object construction includes node and side, institute Stating node indicates the entity, which includes component, attribute or attribute value;Relationship between the side presentation-entity, it is described Relationship between entity includes the relationship between the component, the relationship between the component and the attribute, or, the attribute With the relationship of the attribute value.
For example, the entity and its relationship that extract from a patent are as follows:
Brake lamp includes installation seat plate
Brake lamp includes grating version
Brake lamp includes LED light
Installation seat plate is equipped with shell frame
Shell frame is equipped with partition
Shell frame is equipped with installation cavity
The result of entity extraction in target text is merged with the result that entity relationship is extracted, entire chapter target can be obtained The structure chart (object construction as shown in Figure 3) of text.
In the embodiment of the present application, by the entity in trained entity extraction model extraction target text, then pass through Trained relationship extracts the relationship between entity described in model extraction, is automatically given birth to according to the relationship between entity and entity At the text representation of structuring, either target text or candidate text is made of the relationship between entity and entity, mentions The relationship between the entity and entity in content of text is taken, conducive to the understanding to content of text, conversion speed is fast, saves artificial Cost.
In an application scenarios, user finds a target text (such as patent), and patent length is very long or logicality Stronger, user is taken much time by the content needs of the subjective understanding patent, and user can be by the electronic equipment (such as Mobile phone) this patent is converted into structure chart, mobile phone receives this patent, this patent is input to entity extraction model, leads to Cross the target entity set in the entity extraction model identification patent;Then, the target entity set will have been recognized Patent is input to relationship and extracts model, extracts the relationship between target entity described in model extraction by the relationship;According to mesh The relationship between entity and the target entity is marked, structured representation is carried out to the target text, generates object construction, terminal Show the object construction.Alternatively, user can also be sent this patent to server by terminal (such as mobile phone), by server This patent is converted into object construction, then, which is sent to terminal for the object construction, the terminal display target Structure.Target text is converted into object construction in the embodiment of the present application, more conducively user understands the content in target text, and It is greatly saved cost of labor.
On the basis of the above embodiments, present invention also provides in another embodiment, model pair is extracted by relationship Relationship between entity, which extracts, is possible to will appear such situation, i.e. target entity possibly is present in two words, is led Cause relationship is extracted model and possibly can not be identified.For example, in one example, text to be identified is " battery pack connection monitoring mould Block;It is also connected with CPU processor and display.", " battery pack connecting detection mould can be identified by extracting model by above-mentioned relation Relationship between block ", i.e. battery pack and detection module, since CPU processor and display are in another sentence, Ke Nengcun The case where cannot identify.
For above situation, the relationship solved between entity is present in different sentences, and relationship is extracted model and may be deposited The case where cannot identify, this application provides another embodiments:
The target text includes first instance, after step 203, can also be included the following steps: before step 204
Entity relationship data set is obtained, the entity relationship data set is between the entity and entity extracted in text collection Relationship obtain;The entity relationship matrix includes the relationship between N number of entity and N number of entity, the N be greater than or Equal to 2;
It is inquired in the entity relationship data set, obtains having with the first instance related M second in fact Body, the M are less than or equal to N.
In the presetting range in the target text, the second instance is searched;
If finding at least one target second instance in the M second instance, establish the first instance with Relationship between the target second instance.
Specifically, the entity relationship data set is the reality extracted in text collection firstly, obtaining entity relationship data set What the relationship between body and entity obtained;The entity relationship matrix includes the relationship between N number of entity and N number of entity, The N is more than or equal to 2.
Wherein, the specific method of acquisition entity relationship data set includes:
The text collection is input to entity extraction model, the text collection is identified by the entity extraction model In entity information;Text set can be understood as include more texts set, for example, text collection is combined into Bao Shiwan The set of patent.It should be noted that the quantity for the text for including in text set is for example, not to the application reality Apply the limitation of example.
The target text set for having recognized the entity information is input to relationship and extracts model, is mentioned by the relationship Take the relationship in text collection described in model extraction in every text between entity and entity.The entity relationship data set includes text Relationship in this set in every text between entity and entity.
Shown in the following matrix A of entity relationship data set:
Brake lamp Pedestal …… LED light …… Lamp housing
Brake lamp 0 It is equipped with 0 0
Pedestal 0 0 Include Connection
……
LED light 0 0 Connection
……
Lamp housing 0 Connection Connection 0
Then, it is inquired in the entity relationship data set, obtains having with the first instance related M the Two entities, the M are less than or equal to N.
For example, the first instance is " pedestal ", and " pedestal " is no and other assemblies generate relationship in target text, that , it is likely that a kind of situation of appearance is that " pedestal " and have related component in different sentences from it, then just needing Determine which first instance and which entity have relationship in the entity relationship data set, then in target text, this One entity may also have relationship with which entity.
For example, the first instance is " pedestal ".In the lookup of above-mentioned matrix A and " pedestal " related second instance, specifically Method can be with are as follows:
" pedestal " a line is navigated in matrix A, is obtained and " pedestal " related all components set S_a, S_a packet The component contained are as follows: LED light, lamp housing.The column of " pedestal " one are navigated in matrix A, are obtained and " pedestal " related all components The component that set S_b, S_b include are as follows: Brake lamp, lamp housing.
Include in set S=S_a+S_b, set S (s_0, s_1, s_2 ... s_k ... s_n);
In the above example, include in set S (LED light, lamp housing, Brake lamp).
Further, in the presetting range in the target text, the second instance is searched;
The presetting range can be to be determined by the size of Entities Matching window, according to the size of the Entities Matching window Determine the presetting range in the target text.The size of the Entities Matching window can be preset.
From the position that this component occurs, target is searched in the range of within g position forward and backward g position Second instance.For example, the Entities Matching window is from the position of " pedestal ", 10 characters forward, the model of 10 characters backward In enclosing, second instance is searched.
Finally, it is real to establish described first if finding at least one target second instance in the M second instance Relationship between body and the target second instance.
For example, if finding 3 second instances, this 3 second instances are as follows: Brake lamp, folded piece, LED light, at these three In entity, wherein matching there are two entity with " LED light, the Brake lamp " in set S, being somebody's turn to do " LED light, Brake lamp " is target second Entity, then establish the relationship between " pedestal " and target second instance, and the type of this relationship is " having relationship ".
In the present embodiment, obtain entity relationship data set, inquired in the entity relationship data set, obtain with it is described First instance has related M second instance, and the M is less than or equal to N;Then the preset model in the target text In enclosing, the second instance is searched;If finding at least one target second instance in the M second instance, establish Relationship between the first instance and the target second instance, to solve to distinguish with the related second instance of first instance In different sentences, relationship extracts the situation that model may not be able to identify.
Optionally, the object construction in the embodiment of the present application can be text structure, or picture structure generates figure As the concrete mode of structure includes:
Firstly, obtaining the target image information for indicating the entity;
Specifically, can be from internet data (such as various related forums, patent database, paper database) and local number According to obtaining image collection in library;
Identify the text in described image set in each image;If the text in the target entity and described image set Word matches, then the image information for indicating the target entity is selected from described image set.For example, identification image set Text in conjunction in each image, if the text of text (such as engine) and first object entity wherein in the first image is (such as Engine) match, wherein the text (such as connecting rod) of the text (such as connecting rod) in the second image and the second target entity matches, Wherein the text (such as press mechanism) of the text (such as press mechanism) in third image and the second target entity matches, then selects First image, the second image and third image are as the image information for indicating first object entity and the second target entity.
Then, the mesh indicated according to the relationship between the target entity and the target entity, generation with image information Mark structure.
It please refers to shown in Fig. 4, Fig. 4 is picture structure schematic diagram.For example, " engine ", " connecting rod " and " press mechanism " it Between relationship are as follows: " engine " connection " connecting rod ", " engine " connection " press mechanism ", according to " engine ", " connecting rod " and " under Press mechanism " and its between connection relationship, generate picture structure as shown in Figure 4.In this example, obtain for indicating target reality The image information of body generates picture structure according to the relationship between target entity and target entity, shows the picture structure, more raw Move the vivider relationship embodied in text between each entity and each entity, it is easier to which user understands content of text.
The method for extracting model above for training entity extraction model and relationship is described in detail, and applying below should Entity extraction model and relationship extract model and carry out structured representation to text.
It should be noted that executing the executing subject and above-mentioned steps 201- step 204 of above-mentioned steps 101- step 105 Executing subject can be the same electronic equipment, or different electronic equipments;Step 101- step 105 is in step Before 201, after the completion of entity extraction model and relationship extract model training, step 101- step 105 can not be executed, and Directly execute step 201.
Embodiment 2
It please refers to shown in Fig. 5, the embodiment of the present application also provides a kind of method of determining text similarity, in this example Method is applied to electronic equipment, which can be server, or terminal, this method may include walking as follows It is rapid:
Step 301 obtains target text and candidate data set, and the candidate data set includes multiple arrays, the multiple The semantic vector of one entity of each array representation in array;The entity is contained in candidate text.
Server can receive the target text of terminal transmission, for example, the target text can be a patent.
The specific method that server obtains candidate data set includes at least the following two kinds mode:
In the first possible implementation:
Firstly, obtaining text collection, text set includes n candidate texts, and the n is whole more than or equal to 2 Number, it is to be understood that text set can be all patents of a technical field in patent database, alternatively, text collection The a subset of all patents of a technical field in conjunction or patent database, for example, the n can be 100,000 or million.
Then, the entity in described n candidate text in every candidate text is extracted, m entity is obtained, needs to illustrate It is that in this step, the specific method for extracting the entity in n candidate texts in every candidate text can be according in embodiment 1 The entity extraction model extracts, and by every candidate text input to the entity extraction model, passes through the entity extraction Model exports the entity in every candidate text, obtains m entity, which is the integer more than or equal to 2, for example, the m can Think 10,000,000,20,000,000 etc..
According to the entity that described n candidate text and every candidate text are included, objective matrix is determined, for example, the mesh It is as follows to mark matrix B:
Entity 1 …… Entity j …… Entity m
Patent 1 1 0 0
Patent 2 0 3 4
……
Patent i 0 1 1
…… 0 0
Patent n 6 1 1
In matrix B, including n row and m are arranged, the candidate text of every a line expression one in the n row, in the m column Each column indicate an entity.Wherein, the number that B [i] [j]=entity j occurs in patent i.For example, entity j is in patent 2 The number of middle appearance is 3 times, and the number that entity m occurs in patent i is 1 inferior.
Finally, carrying out singular value decomposition to above-mentioned objective matrix B, candidate data set is obtained.
Specifically, singular value decomposition is carried out to objective matrix B, it is as follows:
B=U Σ VT
Matrix U is obtained, for the matrix of n row k column, every a line indicates the vector of a text (such as patent).
Matrix ∑ is the eigenvalue matrix of matrix B, k row k column, wherein k is specified numerical value, for example, k can be 300.
Matrix V, k row m column, wherein each column indicate the vector of an entity, and in this example, which is The matrix V, the matrix V are referred to as " candidate matrices ".
The example of the matrix V is as follows:
Each column in matrix V, for indicating the k dimensional vector an of component, wherein each value V [i] [j] represents entity Projection value of the j in i-th of dimension.
It should be noted that in this example, objective matrix B and matrix V merely for convenience of description, and the example carried out The expression of property, does not cause the limited explanation to the application.
In the second possible implementation:
The candidate data set can be obtained by trained Word2vec model, candidate data set includes multiple entities Vector, which be trained according to entity corpus set, which can be root It is obtained according to the method recorded in step 101 in embodiment 1, alternatively, the entity corpus set is also possible to pass through entity extraction Model carries out what entity extraction obtained to each text in text collection, in order by each word in entity corpus set For number from 1 to W, W is the integer greater than 1.The entity corpus set is input to Word2vec model, current word and prediction word exist L can be set in maximum distance in one sentence, for example, the l can be 5,10 etc., the l can be carried out for 5 in this example Explanation.It please refers to Fig. 6 to be understood, Fig. 6 is Word2vec model training process schematic.
Word2vec model includes input layer, middle layer and output layer.
Input layer shares d node, corresponding d entity.
Middle layer, shares 300 nodes, each input layer has side to be all connected with 300 nodes.
Output layer shares d node, corresponding d entity.
Traversal obtains the serial number i of t, input layer [i]=1, remaining is defeated for each of entity corpus set entity t Enter node layer=0.
Other words within the distance 5 of t are obtained, serial number a1, a2, a3, a4, the a5 of other words are obtained, output layer a1 is written Position=1, a2 position=1, a3 position=1, a4 position=1, a5 position=1, remaining position=0.
Gradient descent algorithm is called to calculate the weight on each side.
After model training is completed, the weighted list on 300 sides of any input layer i to middle layer node is exactly Represent the vector of i-th of entity.The vector of i entity constitutes the candidate data set.
Candidate data set in this example includes the vector of multiple entities.The entity that will be extracted in each candidate text Be input to the Word2vec model, export the vector of each entity by the Word2vec model, all obtained entities to Amount forms the candidate data set.
Target entity set in step 302, the extraction target text, multiple array institutes table of the candidate data set The entity sets shown include the target entity set.
It can be illustrated so that the first implementation obtains candidate data set as an example in the embodiment of the present application.Please refer to square The example of battle array V, in the matrix V, each column indicates an array, each data includes multiple elements, each element Represent projection value of the entity in dimension.
Target entity set in entity extraction model extraction target text documented by 1 through the foregoing embodiment, the mesh Mark entity sets contain target entity all in the target text, for example, include 3 target entities in the target text, 3 target entities are respectively entity 1 (such as seat board) and entity j (such as LED light).Represented by multiple arrays of candidate data set Entity sets include the target entity set, for example, entity sets (seat board ..., LED represented by the vector in matrix V Lamp ..., connector) contain entity 1 and entity j in target text.It should be noted that for target text in this example Included in entity and quantity and the candidate data set entity that is included and quantity be for facilitating explanation and for example, The limited explanation to the application is not caused.
Step 303 determines each of target entity set target entity and every according to the candidate data set The angle value of each of one candidate text vector of entity, obtains entity similarity.
According to the entity vector that candidate data is concentrated, calculate in each target entity and each piece candidate text The angle value of the vector of each entity.For example, the target entity in target text are as follows: entity 1 and entity j.One candidate text Entity in this c are as follows: entity 2 and entity x calculate separately the similarity of entity 1 Yu entity 2, entity for the candidate text c The similarity of 1 and entity x, the similarity of entity j and entity 2, the similarity of entity j and entity x.
To be illustrated for the similarity of computational entity 1 and entity j:
In the first possible implementation:
Entity similarity (Rela) is the included angle cosine value of two entity vectors.
For example, the included angle cosine value of Rela (entity 1, entity 2)=entity 1 vector (V1) and 2 vector of entity (V2).
In the second possible implementation: determine the terminal of the vector of each target entity with it is described each Target range between an each of the piece candidate text terminal of the vector of entity;
According to the candidate data set determine the semantic vector of each of target entity set target entity with The included angle cosine value (being indicated with " Distance1 ") of each of each piece candidate text semantic vector of entity and The target range (being indicated with " Distance2 ") obtains the entity similarity.
The included angle cosine value of Distance1=V1 and V2.
Wherein, Distance1 is the included angle cosine value of V1 and V2.
Similarity Rela (entity 1, entity 2)=Distance1*weight1+Distance2* of entity 1 and entity 2 weight2。
Wherein, Weight1 represents the weight of Distance1, and weight2 represents the weight of Distance2.Weight1 with Weight2 default value can be 0.5, can also be specified by user according to actual use scene, such as Weight1 is 0.6, Weight2 is 0.4.
In this example, similarity between any two entities is according to the included angle cosine values of two vectors and two vectors The target range of terminal obtains, and both considers the angle of two vectors, it is also contemplated that the final position of two vectors, and user The weight that included angle cosine value and target range can be determined according to practical application scene, improves the similarity between computational entity Accuracy rate.
Step 304, according to the entity similarity, determine that the target text is similar to each candidate target of text Degree.
In the first implementation, for each candidate text, by each target entity in the target text Entity similarity add up, obtain the first cumulative similarity;
According to the described first cumulative similarity, the target similarity of the target text with each candidate text is determined.
For example, in the above example, entity 1 and entity j, wherein the entity in a candidate text c are as follows: entity 2 and reality Body x calculates separately the similarity (being denoted as " Re 1 ") of entity 1 Yu entity 2, the phase of entity 1 and entity x for the candidate text c Like degree (being denoted as " Re 2 "), the similarity (being denoted as " Re 3 ") of entity j and entity 2, entity j and the similarity of entity x (are denoted as " Re 4 "), then, for a candidate text, it will calculate and the similarity of each entity (Re 1 ", " Re 2 ", " Re 3 " add up with " Re 4 "), obtain the first cumulative similarity, optionally can be small by similarity degree during calculating In the score value of 50% (being free of) be all 0.In one implementation, which can be used as target text and waits The similarity of selection sheet.
Optionally, for each candidate text, it is similar to candidate text to calculate each of target text entity Spend sim1.
Sim1=first adds up similarity/(target text entity sum U candidate text entities sum), which can make For the target similarity of the target text and candidate text.
In the present embodiment, electronic equipment obtains target text and candidate data set, candidate data set include multiple arrays, should The semantic vector of one entity of each array representation in multiple arrays;The entity is contained in candidate text;Further , the target entity set in the target text is extracted, entity sets represented by multiple arrays of candidate data set include The target entity set;The language of each of target entity set target entity is determined according to the candidate data set The included angle cosine value of each of adopted vector and each piece candidate text semantic vector of candidate entity, obtains entity phase Like degree;In the present embodiment, the similarity of each candidate entity in each target entity and candidate text can be calculated, according to The entity similarity determines the target similarity of the target text with each candidate text.In the present embodiment, mesh is determined Mark similarity of the similarity of text and candidate text in view of each of target text and candidate text entity, similarity Determination more can real table reveal the content of target text and the similarity of candidate content of text is only led to compared with the existing technology The keyword that user's subjectivity determines is crossed, the determination side of the similarity of target text and candidate text is determined by Keywords matching Method needs the influence by user's subjective understanding, and method provided by the embodiments of the present application is more objective, is target text and candidate's text The expression of this true content, therefore, similarity calculation is more acurrate.
On the basis of above-mentioned example, before step 304, the method also includes following steps:
Extract the relationship in the target text between target entity;
Obtain the candidate relationship set in every candidate text;The relationship by objective (RBO) collection is determined according to the entity similarity The relationship similarity of each candidate relationship in each relationship and the candidate relationship set in conjunction;
In step 304, according to the entity similarity and relationship similarity, the target text and each time are determined The target similarity of selection sheet.
The relationship in the embodiment of the present application includes binary crelation or binary crelation to X member relationship, wherein X is big In or equal to 3 integer, the binary crelation includes the relationship between two entities and two entities.The X member relationship packet X entity, and at least (X-1) a binary crelation are included, should and be somebody's turn to do each binary crelation in (X-1) a binary crelation includes a pass Join entity, at least (X-1) a binary crelation is connected (X-1) a binary crelation by associated entity for this.
For example, then the relationship includes binary crelation and ternary relation when X is equal to 3;When X is equal to 4, then the relationship packet Include binary crelation, ternary relation and quaternary relationship, in the embodiment of the present application, for convenience of explanation, which can be closed with binary It is illustrated for system and ternary relation.
Binary crelation and ternary relation are illustrated below:
Binary crelation: including two entities and its between relationship, i.e. the pass of entity 1+ entity 2+ entity 1 and entity 2 System, such as: Brake lamp (entity 1) includes (relationship) pedestal (entity 2).
Ternary relation: including two binary crelations, for example including binary crelation 1 and binary crelation 2, and two binary crelations Entity having the same in having, which is associated entity, for connecting two binary crelations.The ternary relation is such as (Brake lamp-installation seat plate, installation seat plate-shell frame).Wherein installation seat plate is associated entity.
The method of the similarity of the similarity and ternary relation that determine binary crelation is further illustrated below:
Optionally, the binary crelation between the every two target entity in the target text is extracted, target text is obtained Target binary crelation set.For example, target binary crelation set are as follows: (Brake lamp-installation seat plate, installation seat plate-shell frame, Shell frame-partition, shell frame-installation cavity, Brake lamp-grating version, Brake lamp-LED light).
Obtain the candidate binary set of relationship in every candidate text.For example, candidate binary set of relationship are as follows: (Brake lamp- Pedestal, pedestal-shell frame, shell frame-dust-proof plated film, Brake lamp-grating version, Brake lamp-LED light, LED light-lamp housing).
Each binary crelation and the candidate in the target binary crelation set are determined according to the entity similarity The binary crelation similarity of each candidate binary relationship in binary crelation set.Binary crelation similarity are as follows: target binary is closed The similarity of first object entity in system and the first candidate entity of candidate binary relationship, the second mesh in target binary crelation Mark the similarity of the second candidate entity of entity and candidate binary relationship, relationship and candidate binary relationship in target binary crelation In the sum of the similarity of relationship.Formula indicates are as follows: binary crelation similarity Rela2 (close by target binary crelation, candidate binary System)=Rela1 (target entity 1, candidate entity 1)+Rela1 (target entity 2, candidate entity 2)+R (close by relationship by objective (RBO), candidate System);If relationship 1 is equal to relationship 2, R (relationship 1, relationship 2)=1;If relationship 1 is not equal to relationship 2, R (relationship 1, relationship 2)=0.It is exemplified below, e.g., target binary crelation are as follows: Brake lamp-installation seat plate, candidate binary relationship are as follows: Brake lamp- Pedestal binary crelation similarity Rela2 (Brake lamp-installation seat plate, Brake lamp-pedestal)=Rela1 (Brake lamp, Brake lamp)+ Rela1 (installation seat plate, pedestal)+R (connection, connection).
Further, the binary crelation similarity of each binary crelation in the target text is added up, is obtained Second cumulative similarity;The second cumulative similarity are as follows: each binary crelation in target text is traversed in candidate text One time, the similarity Rela2 with each binary crelation is calculated, entity similarity degree can be remembered less than the score value of 50% (being free of) It is 0, and all similarities is added.
Further, for each candidate text, each of target text binary crelation and candidate structure are calculated Similarity Sim2.Specifically, the union of binary crelation sum and binary crelation sum in candidate text in target text is calculated, For example, binary crelation sum is 12 in target text, binary crelation sum is 14 in candidate text, then union is that 14, Sim3 is The ratio of second cumulative similarity and the union, i.e., as follows:
Sim2=second adds up similarity/(in target text in binary crelation sum U candidate's text binary crelation sum).
Further, on the basis of the above embodiments, can also include the following steps:
According to the target binary crelation set, target ternary relation set is determined, in the target ternary relation set Comprising multiple ternary relations, the ternary relation includes two binary crelations, and has identical reality in described two binary crelations Body.For example, target ternary relation set are as follows: (Brake lamp-installation seat plate, installation seat plate-shell frame), (installation seat plate-shell Frame, shell frame-partition), (installation seat plate-shell frame, shell frame-installation cavity).
Obtain the candidate ternary relation set in every candidate text.For example, candidate ternary relation set are as follows: (Brake lamp- Pedestal, pedestal-shell frame), (pedestal-shell frame, shell frame-dust-proof plated film).
According to the binary crelation similarity determine each ternary relation in the target ternary relation set with it is described The ternary relation similarity of each of candidate ternary relation set candidate's ternary relation.Ternary relation similarity are as follows: target three The binary crelation similarity of first object binary crelation in first relationship and the first candidate binary relationship in candidate ternary relation, With the binary pass of the second target binary crelation in target ternary relation and the second candidate binary relationship in candidate ternary relation It is the sum of similarity, it can be indicated with such as under type:
Rela3 (target ternary relation, candidate ternary relation)=Rela2 (first object binary crelation, first candidate binary Relationship)+Rela2 (the second target binary crelation, the second candidate binary relationship).
For example, target ternary relation are as follows: (Brake lamp-installation seat plate, installation seat plate-shell frame);
Candidate ternary relation are as follows: (Brake lamp-pedestal, pedestal-shell frame);
Rela3 [(Brake lamp-installation seat plate, installation seat plate-shell frame), (Brake lamp-pedestal, pedestal-shell frame)]
=Rela2 (Brake lamp-installation seat plate, Brake lamp-pedestal)+Rela2 (installation seat plate-shell frame, pedestal-shell Frame)
=Rela1 (Brake lamp, Brake lamp)+Rela1 (installation seat plate, pedestal)+R (connection, connection)+Rela1 (installation Seat board, pedestal)+Rela1 (shell frame, shell frame)+R (connection, connection).
The ternary relation similarity of each ternary relation in the target text is added up, the cumulative phase of third is obtained Like degree;The third each ternary relation that similarity is target text that adds up traverse one time in candidate text, calculates and each The similarity Rela3 of candidate ternary relation, entity similarity degree are all 0 less than the score value of 50% (being free of), and all similar Degree is added.
Calculate the similarity Sim3 of each of target text ternary relation and candidate text.Specifically, calculating target The union of ternary relation sum and ternary relation sum in candidate text in text, for example, ternary relation sum in target text It is 10, ternary relation sum is 8 in candidate text, then union is the ratio that 10, Sim3 is third cumulative similarity and the union, It is i.e. as follows:
Sim3=third adds up similarity/(in target text in ternary relation sum U candidate's text ternary relation sum).
Further, the target entity includes special entity, the method also includes:
Determine the entity similarity of the special entity;The special entity can be the entity that user specifies, special entity Quantity do not limit.For example, the special entity is " Brake lamp " or the special entity can be " Brake lamp " and " installation Seat board ", which can be entity important in practical solutions, and in this example, which can be with It is illustrated by taking " Brake lamp " as an example.Such as the candidate entity in candidate text including is " Brake lamp ", " pedestal " and " lamp housing ", For candidate's text, the entity similarity of special entity includes: that the similarity of " Brake lamp " and " Brake lamp " (is denoted as " R11 "), the similarity (being denoted as " R12 ") of " Brake lamp " and " pedestal ", the similarity of " Brake lamp " and " lamp housing " (is denoted as “R13”)。
For each candidate text, the entity similarity of the special entity is added up, it is tired to obtain the 4th Add similarity;4th cumulative similarity are as follows: R11+R12+R13.
In above-mentioned steps 304, according to the above-mentioned first cumulative similarity, the second cumulative similarity, the cumulative similarity of third With the similarity SIM of the 4th cumulative similarity and its corresponding weight calculation target text and candidate text.
Formula 1:SIM=sim1*weight1+sim2*weight2+sim3*weight3+sim4*weight 4, wherein Weight1 is the weight of entity similarity, and weight2 is the weight of binary crelation similarity, and weight3 is that ternary relation is similar The weight of degree, weight4 are the weights of special entity similarity.
Above-mentioned weight1, weight2, weight3 and weight4 can be configured according to the scene specifically applied, For example, user think special entity similarity and binary crelation similarity it is even more important, then can by weight2 and Weight4 is set as high value, such as weight4 is 0.4, weight2 0.3, and weight1 0.2, weight3 are 0.1.Under normal conditions, weight1, weight2, weight3 and weight4 can be set to 0.25.
By formula 1 it is found that in the first possible implementation, can be tired out according to the first cumulative similarity and second Add similarity, determines the similarity of target text and candidate text, i.e. weight3 is the case where 0, weight4 is 0.
In the second possible implementation, can add up similarity according to the first cumulative similarity and third, determine The similarity of target text and candidate text, i.e. weight2 is the case where 0, weight4 is 0.
In the third possible implementation, it can be determined according to the first cumulative similarity and the 4th cumulative similarity The similarity of target text and candidate text, i.e. weight2 is the case where 0, weight3 is 0.
It in the fourth possible implementation, can be according to the first cumulative similarity, the second cumulative similarity and third The case where cumulative similarity determines the similarity of target text and candidate text, i.e. weight4 is 0.
It in a fifth possible implementation, can be according to the first cumulative similarity, the second cumulative similarity and the 4th The case where cumulative similarity determines the similarity of target text and candidate text, i.e. weight3 is 0.
It further, can be according to the size of SIM to every in target text and candidate text collection in the embodiment of the present application The similarity of candidate text is ranked up, and is ranked up according to the sequence of similarity from big to small or sequence from small to large It is ranked up, the candidate text of preset quantity is shown according to the sequence of similarity, for example, according to the candidate text of sequence display 3 This.
In the present embodiment, by calculating the candidate entity in each of target text target entity and candidate text Relationship in the relationship between target entity in similarity, target text and candidate text between candidate entity determines target text The similarity of this and candidate text considers the similarity of entity, it is also contemplated that the similarity of relationship, entity and its entity it Between relationship can more embody the practical expression of content in text.Further, which may include binary crelation to N member and closes System, for example, the relationship may include binary crelation and ternary relation, binary crelation includes between two entities and two entities Relationship, ternary relation includes two binary crelations, and two binary crelations can be attached by associated entity.The application is real Apply in example, ternary relation be related to three entities and its between relationship therefore calculate target binary crelation and candidate binary relationship Similarity and the similarity of target ternary relation and candidate relationship can more embody the practical expression of content in text.Further , can also determine the similarity of specific objective entity, can be determined according to the specific application scenarios of user target text with The similarity of candidate text enhances user's actual need degree.
Optionally, the novel degree of the target text with each candidate text, institute are determined according to the target similarity State novel degree and the target similarity inverse correlation.The similarity of target text and candidate text is higher, then the target text phase It is lower for the novelty of candidate's text.For example, target similarity is 70%, then novelty degree can be with are as follows: 1-70%=30%, Alternatively, novelty degree can be 1-k*70%, wherein k is correction coefficient, in the present embodiment, the specific method for determining novelty It does not limit, novel degree and the target similarity inverse correlation.
Optionally, on the basis of the above embodiments, target text in the present embodiment can be object construction, candidate Text can be candidate structure, i.e., by the method for the record in embodiment 1, extract model by entity extraction model and relationship Target text is converted into object construction, extracting model candidate text conversion by entity extraction model and relationship is candidate knot Structure.
Specifically, the target text is the text of structuring, and in step 201, the step of the acquisition target text, It can be the following steps are included: obtaining target text;
The target text is input to entity extraction model, the target text is identified by the entity extraction model In entity;
The target text for having recognized the entity is input to relationship and extracts model, model is extracted by the relationship and is mentioned Take the relationship between the entity;
According to the relationship between the entity and the entity, structured representation is carried out to the target text, generates knot The text of structure.
Optionally, in above-mentioned steps 202, the target entity set extracted in the target text can be specific Include the following steps:
Using the target text as the input of entity extraction model, pass through target described in the entity extraction model extraction Target entity set in text, the entity extraction model is trained to the first corpus set, described First corpus set is to carry out entity corpus labeling to every text in the first text collection to obtain.
Optionally, the step of binary crelation between the every two entity extracted in the target text, can be with Specifically comprise the following steps:
The target text for having recognized the target entity set is input to relationship and extracts model, is mentioned by the relationship Take the relationship between target entity described in model extraction;The relationship, which extracts model, to be carried out to the second corpus information set Trained, the second corpus set is to carry out relationship corpus labeling and entity mark to every text of second text collection What note obtained.
Embodiment 3
It please refers to shown in Fig. 7, the embodiment of the present application also provides a kind of method of determining text novelty degree, this method applications In a kind of electronic equipment, which can be server, or in terminal the present embodiment, which can be with It is illustrated for terminal, this method specifically comprises the following steps:
Step 401 determines target text.
For example, the target text can be a patent, a paper, in the present embodiment, which is with patent Example is illustrated.
Multiple target entities in step 402, the extraction target text, obtain target entity set.
In this example, by multiple target entities in target text described in the entity extraction model extraction in embodiment 1, Specifically, the target text is input to entity extraction model, the target text is identified by the entity extraction model In multiple target entities, multiple target entity forms the target entity set.
Step 403, the candidate entity sets for obtaining every candidate text in candidate text collection.
For candidate's text collection including that can be patent set, which includes more candidate texts (as specially Benefit), server gets candidate's text collection from patent database, extracts every in candidate's text collection offline in advance The candidate entity of candidate text, obtains candidate entity sets.Alternatively, the server can also be in On-line testing candidate's text collection Every candidate text candidate entity, the candidate entity sets are obtained, specifically, reality documented by embodiment 1 can be passed through Body extracts the candidate entity of every candidate text in model extraction candidate text collection, obtains candidate's entity sets.
Step 404, the first instance intersection for determining the target entity set and the candidate entity sets, described first Entity intersection is the entity to match in the target entity set and the candidate entity sets.
For example, the target entity collection is combined into (Brake lamp, pedestal, lamp housing), candidate entity sets (Brake lamp, installation seat plate, Shell frame).The first instance intersection is (Brake lamp).
Step 405 determines the target according to the difference parameter of the first instance intersection and the target entity set The novel degree of text and the candidate text.
The difference parameter according to the first instance set and the target entity set determines first instance novelty Degree.That is:
First instance novelty degree=[target entity set-intersection (target entity set, candidate entity sets)]/target is real Body set=1- first instance intersection/target entity set.
In the present embodiment, the difference parameter of first instance intersection and target entity set is that first instance intersection and target are real The ratio of body set, alternatively, the difference parameter of the first instance intersection and target entity set may be first instance intersection Difference parameter with target entity set is the ratio of first instance intersection and target entity set multiplied by a coefficient, the difference Different parameter does not repeat herein there are also other deformations.
In the present embodiment, it is first determined need the target text of novelty degree to be determined, which can be for one specially Benefit;Multiple target entities in the target text are further extracted, target entity set is obtained;Obtain candidate text collection In every candidate text candidate entity sets;Traverse each candidate text, determine the target entity set with it is each The first instance intersection of the candidate entity sets of the candidate text of a piece, first instance intersection are the target entity set and the time Select the entity to match in entity sets;Finally, being joined according to the difference of the first instance intersection and the target entity set Number determines the novel degree of the target text and the candidate text.In the present embodiment, it is contemplated that all mesh in target text Candidate entity all in entity and every candidate text is marked, according to the difference of first instance intersection and the target entity set Different parameter determines the novel degree of the target text and the candidate text, compared with the existing technology, only subjective really by user Fixed keyword determines novelty degree by Keywords matching, and the determination method of novel degree needs the shadow by user's subjective understanding It rings, it is the expression of target text and candidate text true content that method provided by the embodiments of the present application is more objective, therefore, novel Degree calculates more acurrate.
Optionally, on the basis of the above embodiments, the embodiment of the present application can also include following step before step 405 It is rapid:
Multiple binary crelations in the target text are extracted, target binary crelation set is obtained, binary crelation includes two A entity and its between relationship;
Obtain the candidate binary set of relationship including multiple binary crelations in the candidate text;
Determine the first binary crelation intersection of the target binary crelation set Yu the candidate binary set of relationship, it is described First binary crelation intersection includes the binary to match in the target binary crelation set and the candidate binary set of relationship Relationship;
Then, in step 405, determined according to the difference parameter of the first instance set and the target entity set The novel degree of the target text and the candidate text can specifically include:
First instance novelty degree is determined according to the difference parameter of the first instance set and the target entity set; That is: first instance novelty degree (R1_1)=[target entity set-intersection (target entity set, candidate entity sets)]/target Entity sets=1- first instance intersection/target entity set.According to the first binary crelation intersection and the target binary The difference parameter of set of relationship determines the first binary crelation novelty degree;
R2_1=[target binary crelation set-intersection (target binary crelation set, candidate binary set of relationship]/target Binary crelation set=the first binary crelation of 1- intersection/target binary crelation set.
The difference parameter of the first binary crelation intersection and the target binary crelation set can be the first binary crelation The ratio of intersection and the target binary crelation set, or, or the ratio is multiplied by other deformations such as coefficient, tool The not restriction of body.
Optionally, in the mode that another kind may be implemented, according to the first instance novelty degree and the first binary crelation Novel degree and its respective weight can determine the novel degree of the target text and the candidate text.In this kind of implementation In, the novel degree of target binary crelation and the candidate binary relationship in candidate text in target text is further calculated, When determining that the novelty of target text and candidate text is spent, both in view of the novel degree between entity, further binary is combined to close Novel degree between system, improves the accuracy rate of novel degree.
On the basis of the above embodiments, can also include the following steps:
The target ternary relation set in the target text is extracted, the target ternary relation set includes multiple ternarys Relationship, the ternary relation include two binary crelations, entity having the same in described two binary crelations;
Obtain the candidate ternary relation set including multiple ternary relations in the candidate text;
Determine the first ternary relation intersection of the target ternary relation set and the candidate ternary relation set, it is described First ternary relation intersection includes the ternary to match in the target ternary relation set and the candidate ternary relation set Relationship;
Wherein, described that the target text is determined according to the first instance novelty degree and the first binary crelation novelty degree The novel degree of this and the candidate text, can also specifically include:
The first ternary is determined according to the difference parameter of the first ternary relation intersection and the target ternary relation set Relationship novelty degree.That is, R3_1=[target ternary relation set-intersection (target ternary relation set, candidate ternary relation collection Close]/target binary crelation set=the first ternary relation of 1- intersection/target ternary relation set.The first ternary relation intersection Difference parameter with the target binary crelation set can be the first ternary relation intersection and the target ternary relation set Ratio, alternatively, the ratio multiplied by coefficient etc. other deformation, do not limit specifically.
According to the first instance novelty degree, the first binary crelation novelty degree and the first ternary relation novelty degree And its corresponding weight determines the novel degree of the target text and the candidate text.
Novelty degree=the R1_1*weight1+R2_1*weight2+R3_1*weight3, wherein in this example, should Weight1 is the weight of first instance novelty degree;Weight2 is the weight of the first binary crelation novelty degree;Weight3 is The weight of one ternary relation.In this kind of implementation, target ternary relation and candidate in target text are further calculated The novel degree of candidate ternary relation in text both considers entity when determining that the novelty of target text and candidate text is spent Between novel degree, further combine the novel degree between binary crelation and the novel degree between ternary relation, improve new The accuracy rate of clever degree.
It should be noted that the relationship can also include 4 yuan of relationships, 5 yuan of relationships etc., this reality in the embodiment of the present application It applies in example, is only illustrated using binary crelation and ternary relation as example, do not cause the limited explanation to the application.
Optionally, in the present embodiment, which is structured text, as object construction, candidate's text collection In every candidate text be structuring candidate structure.In this example, candidate map, Ke Yili can be obtained according to candidate structure Solution, candidate map may include an at least candidate structure, when candidate map includes a candidate structure, candidate figure It composes identical as candidate structure.When candidate map includes being more than or equal to 2 candidate structures, understood incorporated by reference to Fig. 8, Fig. 8 is the structural schematic diagram of candidate map, and the method for determining candidate's map can also include the following steps:
Determine the associated entity of first candidate structure and second candidate structure;Such as, the first candidate structure includes Entity: pedestal, lamp housing and lampshade.Relationship between entity includes: pedestal-Shade base-lamp housing.The second candidate structure packet The entity included: lamp housing, wick and electric switch.Relationship between entity includes: lamp housing-wick lamp housing-electric switch.The then first candidate knot The associated entity of structure and second candidate structure is " lamp housing ".
First candidate structure and second candidate structure are associated by the associated entity, obtained described Candidate map.Understood incorporated by reference to Fig. 8, by the associated entity by first candidate structure and second candidate structure It is associated.
On the basis of the above embodiments, optionally, in the present embodiment, when target text and candidate text are structuring When text, in the embodiment of the present application, it can be wrapped in candidate's map by calculating the novel degree of object construction and candidate map The quantity of the candidate structure contained does not limit, for example, may include 3 candidate structures in candidate's map, 4 candidate structures, Or all candidate structures are ok in candidate entity sets, it is real can to pass through association for the relevant entity of each piece candidate structure Each piece candidate structure is attached by body, in practical applications, the quantity of candidate structure included in candidate's map It does not limit, in the present embodiment, for convenience of explanation, the quantity of candidate's text included in candidate's map can be with 2 For be illustrated.Method in the embodiment can also include the following steps:
Extract the candidate entity sets of candidate map;In candidate map, each node indicates an entity, each The set of side expression relationship.Still by taking binary crelation and ternary relation as an example, binary crelation collection is combined into candidate map the relationship The set of relationship of all two adjacent nodes.Ternary relation collection is combined into the relationship of all three adjacent nodes in candidate map Set.
Determine the second instance intersection of the candidate entity sets of the target entity set and the candidate map, this step It can be understood in conjunction with step 404 in the present embodiment;
Second instance novelty degree is determined according to the difference parameter of the second instance intersection and the target entity set; I.e.: second instance novelty degree R1_2=(target entity set-intersection [target entity set, candidate entity sets)]/target reality Body set=1- second instance intersection/target entity set.This step can be understood in conjunction with step 405 in the present embodiment.
Optionally, this method can also include the following steps:
Multiple binary crelations in the object construction are extracted, target binary crelation set is obtained;For example, the target binary A target binary crelation for including in set of relationship is " lamp housing-wick ".
Two target entities that each target binary crelation in the target binary crelation set is included are navigated to Corresponding two provider locations in candidate's map;The target binary crelation " lamp housing-wick " is navigated into candidate map In, " lamp housing " and " wick " the two nodes are found in candidate map.
Calculate the distance between corresponding described two provider locations of each target binary crelation;It calculates in candidate map " lamp housing " arrives the distance of " wick ", it should be noted that the interval between node two neighboring in candidate map is equal (as remembered A), to calculate the distance between two nodes, it can be understood as from first instance position (such as " lamp housing ") to second instance position The distance in (such as " wick ") path is a from " lamp housing " to the distance of " wick " by taking Fig. 8 as an example, and from " pedestal " to " wick " Path are as follows: from " pedestal " to " lamp housing " is a from " pedestal " to the distance of " lamp housing ", from " lamp housing " from " lamp housing " again to " wick " Arriving " wick " again also is a, i.e., is 2a from the distance L of from " pedestal " to " wick ".
Second binary crelation of each target binary crelation relative to the candidate map is determined according to the distance Novel degree;The novelty score R2_2 of one the second binary crelation is directly proportional to L, and the L the short, and then novelty degree is lower, and the L the long then new Clever degree is higher.
It, can be novel according to second instance novelty degree R1_2 and the second binary crelation in the mode that the first may be implemented Degree R2_2 and its corresponding weight determine the novel degree of the object construction and the candidate map.This kind of implementation In, it determines second instance novelty degree, further calculates second in target text in target binary crelation and candidate map The novel degree of binary crelation had both considered the novel degree between entity when determining that the novelty of object construction and candidate structure is spent, The novel degree between binary crelation is further combined, the accuracy rate of novel degree is improved.
In the mode that may be implemented at second, firstly, obtaining the time including multiple binary crelations in the candidate map Select binary crelation set;Determine that the second binary crelation of the target binary crelation set and the candidate binary set of relationship is handed over Collection;Determine that the first binary crelation is new according to the difference parameter of the second binary crelation intersection and the target binary crelation set Clever degree;
It is then possible to according to above-mentioned first binary crelation novelty degree and the second binary crelation novelty degree and its corresponding The novel degree of weight calculation binary crelation.That is: the first binary crelation of binary crelation novelty degree R2=novelty degree R2_1* weight1 + R2_2*weight2, wherein in this example, weight1 is the weight of R2_1;Weight2 is the weight of R2_2;The weight can To carry out different settings according to different application scenarios.
Then, true according to second instance novelty degree R2_1 and binary crelation novelty degree R2_2 and its corresponding weight The novel degree of the structure that sets the goal and the candidate map.In this kind of implementation, the binary crelation novelty degree is by the first binary Relationship novelty degree and the second binary crelation novelty degree and its respective weights determine jointly, increase determining binary crelation novelty degree It is applicable in scene.
On the basis of the above embodiments, optionally, this method can also include the following steps:
Multiple ternary relations in the target map are extracted, target ternary relation set is obtained;Such as, which closes It is " lamp housing-wick-electric switch " that assembly, which closes a target ternary relation for including,.
Three target entities that each target ternary relation in the target ternary set is included are navigated to described Three provider locations of correspondence in candidate map;Should " lamp housing-wick-electric switch " navigate to respectively it is corresponding in candidate map " lamp housing ", the position of " wick " and " electric switch ".
Calculate the shortest distance of any two position in three provider locations;Calculate two nodes of arbitrary neighborhood, " lamp Shell " and the shortest distance L1 of " wick " in candidate map, " wick " and the shortest distance L2 of " electric switch " in candidate map.Meter Calculate the sum of two shortest distances, the novel degree score R3_2 of second ternary relation is directly proportional to L1+L2, L1+L2 more it is short then Novel degree is lower, and the more long then novelty degree of L1+L2 is higher.
It in the third possible implementation, can be according to the second instance novelty degree R1_2, the second binary crelation Novel degree R2_2 and the second ternary relation novelty degree R3_2 and its corresponding weight determine the object construction and the time Select the novel degree of map.
For example, the novelty degree=R1_2*weight1+R2_2*weight2+R3_2*weight3, in this kind of implementation, Weight1 is the weight of second instance novelty degree, and weight2 is the weight of the second binary novelty degree, and weight3 is the second ternary The weight of relationship novelty degree.
It further, can be according to the size of novel degree to target text and candidate text collection in the embodiment of the present application In the novel degree of every candidate text be ranked up, be ranked up according to the sequence of novel degree from big to small or from small to large Sequence is ranked up, and the candidate text of preset quantity is shown according to the sequence of novel degree, for example, according to 3 times of sequence display Selection sheet.
In this kind of implementation, the in target ternary relation in object construction and candidate map is further calculated The novel degree of two ternary relations, when determining that the novelty of object construction and candidate structure is spent, both in view of the novelty between entity Degree further combines the novel degree between binary crelation and the novel degree between ternary relation, improves the accurate of novel degree Rate.
Further, on the basis of the third above-mentioned implementation, the 4th kind of possible implementation is additionally provided, it should Method can also include the following steps:
Determine the target ternary relation set of the object construction and the candidate ternary relation set of candidate map Second ternary relation intersection;
[the first ternary is determined according to the difference parameter of the second ternary relation intersection and the target ternary relation set Relationship novelty degree;That is: R3_1=[target ternary relation set-intersection (target ternary relation set, candidate ternary relation collection Close)]/target ternary relation set.
In the fourth possible implementation, firstly, according to the first ternary relation novelty degree and the described 2nd 3 First relationship novelty degree and corresponding weight determine ternary relation novelty degree;That is: ternary relation novelty degree
R3=R3_1*weight1+R3_2*weight2, in this kind of implementation, weight1 is the weight of R3_1; Weight2 is the weight of R3_2.
Then, according to the second instance novelty degree R1_2, binary crelation novelty degree R2 and the ternary relation novelty degree R3 and its corresponding weight determine the novel degree of the object construction and the candidate map.In this kind of implementation, The ternary relation novelty degree determines jointly by the first ternary relation novelty degree and the second ternary relation novelty degree and its respective weights, Increase the applicable scene of determining ternary relation novelty degree.
It should be noted that in the embodiment of the present application, being mutually related in embodiment 1, embodiment 2 and embodiment 3 interior Appearance can be quoted mutually.It such as, can also include such as in the multiple binary crelations extracted in the target text the step of Lower step:
Entity relationship data set is obtained, the entity relationship data set is according between the entity and entity in text collection Relationship obtain;The entity relationship matrix includes the relationship between N number of entity and N number of entity, the N be greater than or Equal to 2;
It is inquired in the entity relationship data set, obtains having with the first instance related M second in fact Body, the M are less than or equal to N;
In the presetting range in the target text, the second instance is searched;
If finding at least one target second instance in the M second instance, establish the first instance with Relationship between the target second instance.
In the presetting range in the target text, in the step of searching before the second instance, the method is also It may include steps of:
Create Entities Matching window;
The presetting range in the target text is determined according to the size of the Entities Matching window.
In the multiple target entities extracted in the target text the step of, following step can also be specifically included It is rapid:
The target text is input to entity extraction model, the target text is identified by the entity extraction model In multiple target entities.
In the multiple binary crelations extracted in the target text the step of, following step can also be specifically included It is rapid:
The target text for having recognized the target entity is input to relationship and extracts model, mould is extracted by the relationship Type extracts the binary crelation between the target entity.
According to the relationship between the target entity, structured representation is carried out to the target text, generates object construction. The object construction includes node and side, and the node is for indicating the target entity, and the side is for indicating target entity Between relationship.
Embodiment 4
Referring to Fig. 9, the embodiment of the present application provides a kind of method for obtaining image information, this method is applied to a kind of electricity Sub- equipment, the electronic equipment can be server, or terminal, the executing subject in the embodiment of the present application is specifically not It limits, this method may include steps of:
Step 501 receives target text information to be matched;Wherein, target text information includes target entity.
If executing subject is terminal, terminal receives the target text information to be matched of user's input.If the executing subject For server, then the target text information to be matched that server receiving terminal is sent, for example, the target text information is " hair Motivation ".In an application scenarios, which can be illustrated by taking server as an example, and e.g., user wants search " hair The corresponding image information of motivation ", terminal receive " engine " of user's input, and terminal sends the target entity to server, The server receives the target text information.It should be noted that the quantity of the target entity in the embodiment of the present application and unlimited Fixed, it is only exemplary illustration that the target entity, which is " engine ", in this example, does not cause the limited explanation to the application.
Step 502, target entity with image data is concentrated into each candidate image associated by candidate entity match.
Server target entity concentrated into each candidate image with image data associated by candidate entity match, In, which can be server internal storage, be also possible to obtain from another equipment, specifically not It limits.It includes a large amount of candidate image that the image data, which is concentrated, and each candidate image has associated candidate entity.For example, Candidate image 1 is associated with " connecting rod ", and candidate image 2 is associated with " engine " etc..
If candidate entity associated by the first candidate image that step 503, target entity are concentrated with image data matches, Then determine that the first candidate image is the candidate image to match with target entity.
For example, if candidate associated by the first candidate image that target entity (such as " engine ") and image data are concentrated is real Body (such as " engine ") matches, it is determined that first candidate image is the candidate image to match with target entity.
Specifically, candidate entity associated by the first candidate image of target entity and image data concentration is matched Concrete mode can be with are as follows:
Firstly, the semantic vector of candidate entity associated by the semantic vector of acquisition target entity and each candidate image;? In a kind of possible implementation, the language of target entity can be obtained by " candidate matrices " in step 301 in embodiment 2 The semantic vector of adopted vector sum candidate entity, concrete implementation mode are managed incorporated by reference to by step 301 in embodiment 2 Solution, does not repeat herein.It, can be according in embodiment 2 in step 301, by trained in second of possible implementation Word2vec model obtains the speech vector of target entity and the semantic vector of candidate entity, concrete implementation mode, incorporated by reference to Understood by step 301 in embodiment 2, is not repeated herein.
Then, the included angle cosine value of the semantic vector of target entity and the semantic vector of candidate entity is calculated.
According to the included angle cosine value of the semantic vector of target entity and the semantic vector of candidate entity, the target entity is obtained With the similarity of candidate entity, the similarity is higher, shows that the matching degree of the target entity and candidate entity is higher.
According to matching degree by high sequence on earth, U associated with the target entity candidate entity is determined, which is big In or equal to 1 integer, determine that candidate image associated by this U candidate entity is the first candidate image, this is first candidate The quantity of image does not limit.
Step 504, the first candidate image of output.
If executing subject is terminal, terminal display first candidate image.If the executing subject is server, the clothes Business device sends first candidate image to terminal, so that the terminal display first candidate image.
In an application scenarios, user inputs " engine ", and terminal receives " engine ", then should " engine " It is sent to server, server matches " engine " with the candidate entity of each of image data concentration, last server The similarity for being matched to target entity " engine " and candidate entity " engine " is higher than threshold value, target entity " engine " and time Select the similarity of entity " engine " also above threshold value, it is determined that candidate image Aa and time associated by candidate's entity " engine " Selecting candidate image Ab associated by entity " engine " is the first candidate image.Server by candidate image Aa and candidate image Ab to Terminal is sent, the terminal display candidate image Aa and candidate image Ab.
In the embodiment of the present application, target text information to be matched is received first;Wherein, target text information includes target Entity;Then candidate entity associated by target entity being concentrated each candidate image with image data matches;If target is real Candidate entity associated by the first candidate image that body is concentrated with image data matches, it is determined that the first candidate image is and mesh The candidate image that mark entity matches;Export the first candidate image.In the embodiment of the present application, the first candidate image of output be with The candidate image that target entity in target text information matches, the expression target entity which can be more lively, The method that image information is obtained in the embodiment of the present application is not needed as being needed in the artificial text of access piece by piece in the prior art Attached drawing, the image that selection matches with target entity, is greatly saved cost of labor.
On the basis of the above embodiments, the image data set can be pre-established, below to how establishing the picture number It is described in detail according to collection.In step 503, image data set includes the first image data set, by target entity and picture number According to before concentrating text information associated by each candidate image to be matched, method can also include the following steps:
In the first possible implementation, image data set includes the first image data set.
Obtain candidate text collection;Wherein, candidate text collection can be patent text set, candidate's text collection packet More candidate texts are included, every candidate text includes candidate entity;If executing subject is terminal, which can be from server Candidate's text collection is obtained, if the executing subject is server, which can be server internal storage, Alternatively, being also possible to what server was obtained from another equipment, do not limit specifically, in the embodiment of the present application, the executing subject It can be illustrated by taking server as an example.
The frequency that each candidate entity occurs in candidate text collection is counted, for example, in candidate's text collection, " hair The frequency that motivation " occurs is 10000 times, and the frequency that " connecting rod " occurs is 9900 times, and the frequency that " press mechanism " occurs is 9800 It is secondary etc., in this example for candidate entity and its appearance the frequency by way of example only, do not cause to the embodiment of the present application Limited explanation.
High frequency entity is determined according to the frequency;Wherein, high frequency entity includes that the frequency occurred in candidate text collection is higher than The entity of thresholding, such as high frequency entity are the entity that the frequency is higher than 9000.Alternatively, high frequency entity includes being ranked up according to the frequency Afterwards, the entity before preset position, for example, the sequence by all entities occurred in candidate text according to the frequency from high to low It is ranked up, selects entity of the ranking before 10000 for high frequency entity.
By at least one the corresponding candidate image of each high frequency entity associated, the first image data set is obtained.First figure As the high frequency entity in data set is the higher entity of frequency occurred.
Optionally, image data set further includes the second image data set, and target entity and image data are concentrated each candidate Before text information associated by image is matched, method can also include the following steps:
Obtain candidate text collection;Wherein, every in candidate text collection candidate text includes Detailed description of the invention and attached drawing, Detailed description of the invention includes the mark of candidate entity and candidate entity, and attached drawing includes candidate image and mark;In candidate's text collection Every candidate text (such as patent), every patent includes Detailed description of the invention and attached drawing, please refers to Figure 10 and is understood, Figure 10 is attached The schematic diagram of figure explanation and attached drawing.In Figure 10, comprising multiple candidate entities and each candidate entity in attached drawing in Detailed description of the invention In corresponding number, such as " soy bean milk making machine ontology " reference numeral " 1 ", in the accompanying drawings number " 1 " corresponding to candidate entity candidate Image is the candidate image of " soy bean milk making machine ontology ";It is real to number candidate corresponding to " 2 " in the accompanying drawings for " head " reference numeral " 2 " The candidate image of body is the candidate image of " head ".
The incidence relation that candidate entity and candidate image are established according to mark, obtains the second image data set.Identify attached drawing In mark (as number), the number in Detailed description of the invention match with the number in attached drawing, then numbers corresponding time for identical It selects entity to be associated with candidate image, obtains second image data set.
Optionally, image data set further includes third image data set, and target entity and image data are concentrated each candidate Before text information associated by image is matched, method can also include the following steps:
Obtain candidate text collection;Wherein, the candidate text of every in candidate text collection includes title and Figure of abstract; For candidate's text still by taking patent as an example, every patent will include title and Figure of abstract, which is that can indicate The Main Reference of this patent.Such as, entitled " a kind of soy bean milk making machine " of the patent.
Extract the Figure of abstract in candidate text.
Identify the candidate entity in title;Such as it is by the candidate entity in entity extraction model extraction " a kind of soy bean milk making machine " " soy bean milk making machine ".
The incidence relation for establishing candidate entity and Figure of abstract, obtains third image data set.Establish should " soy bean milk making machine " with The incidence relation of the Figure of abstract.
It should be noted that the image data set may include the first image data set, the second image data set and third At least one image data set in image data set.It include the first picture number with the image data set in the embodiment of the present application According to being illustrated for collection, the second image data set and third image data set.
Optionally, in above-mentioned steps 502, target entity and image data are concentrated into candidate associated by each candidate image The step of entity is matched can specifically comprise the following steps:
Firstly, candidate entity associated by target entity with the first image data is concentrated each candidate image matches; Concentrating the candidate entity for including in the first image data is the higher entity of frequency of occurrence, can be first real by target entity and high frequency Body matches, to improve rate matched.
If target entity is not matched to candidate entity in the first image data concentration, by target entity and in addition to the first figure As other image datas except data set concentrate candidate entity associated by each candidate image to be matched.If the target entity It is not matched to candidate entity in the first image data concentration, then by target entity and the second image data set and/or third image Candidate entity associated by each candidate image is matched in data set.If target entity is matched in the first image data concentration The associated candidate image of candidate's entity is then directly sent to terminal by candidate entity, so that terminal display candidate's entity. In the embodiment of the present application, target entity is first matched with the first image data set, matched rate is improved.
Optionally, on the basis of the above embodiments, image data set further includes candidate image relationship, candidate image relationship Including the relationship between at least two candidate images and at least two candidate images.For example, candidate image relationship are as follows: (candidate's figure As 1 connection candidate image 2), such as candidate image relationship (soy bean milk making machine ontology image connects head image).The candidate image relationship is It is obtained according to the relationship between candidate entity, model is such as extracted by relationship and identifies that the relationship between candidate entity is " beans Pulp grinder ontology " connection " head ", then according to the relationship between candidate entity determine candidate entity associated by pass between image It is to get candidate image relationship is arrived.
Optionally, on the basis of the above embodiments, when the first candidate image is contained in target candidate images relations, Such as, it is concentrated in the image data, target candidate images relations are (soy bean milk making machine ontology image connects head image), the first candidate figure Picture (such as soy bean milk making machine ontology image) is contained in the target candidate image, and method can also include the following steps:
The second candidate image for including in target candidate images relations, the second candidate image and the first candidate figure are determined first As having relationship;Determine the second candidate image (such as head image) for including in the target candidate images relations.
Then the first candidate image and the second candidate image are exported.
In an application scenarios, if the target entity of user's input is " soy bean milk making machine ", want more lively by image information Understanding " soy bean milk making machine " structure, which is sent to server by terminal, and server is by the target entity (soy bean milk making machine) Candidate entity associated by each candidate image is concentrated to match with image data, the candidate entity being matched at this time is " soya-bean milk Machine ontology ", further, the first candidate image associated by the soy bean milk making machine ontology (i.e. soy bean milk making machine ontology image) are candidate with second Image (i.e. head image) has connection relationship, waits then then exporting the first candidate image (i.e. soy bean milk making machine ontology image) and second Select image (i.e. head image).It should be noted that the quantity in the embodiment of the present application for the second candidate image does not limit, In practical applications, the quantity of first candidate image does not limit, for example, the quantity of first candidate image is 2, each First candidate image may have the second candidate image of incidence relation, and the quantity of the second candidate image does not also limit, example Such as, for each first candidate image tool there are two the second candidate image with incidence relation, the amount of images finally exported is 4 A, the first candidate image and the second candidate image of output can be a topological structure, and as shown in figure 11, Figure 11 is the first time Select the topological schematic diagram of image and the second candidate image.Terminal not only shows the image information of " soy bean milk making machine " and has with " soy bean milk making machine " Other related image informations.In the present embodiment, it can be exported to have with the first candidate image according to candidate image relationship and closed Second candidate image of system does not need manual analysis retrieval and other related images of the first candidate image, saves artificial Cost increases application scenarios.
On the basis of the above embodiments, optionally, target entity includes at least first object entity and the second target is real Body, target text information further include the first relationship between first object entity and the second target entity;Method can also be specific Include the following steps:
If the first candidate entity associated by the first candidate image that first object entity is concentrated with image data matches, Second candidate entity associated by the second candidate image that second target entity is concentrated with image data matches;Then by the first mesh The first relationship between entity and the second target entity is marked, and second between the first candidate entity and the second candidate entity closes System is matched;
If the first relationship matches with the second relationship, method further include:
Export the second candidate image.
In an application scenarios, if user's input is first object entity for " soy bean milk making machine ", the second target entity is " head ", the first relationship between the first object entity and the second target entity is " connection ", if first object entity (beans Pulp grinder) the first candidate entity (soy bean milk making machine ontology) associated by the first candidate image for concentrating with image data matches, and second Second candidate entity phase associated by the second candidate image (head image) that target entity (head) is concentrated with image data Match, then further matching relationship, which is " connection ", the between the first candidate entity and the second candidate entity Two relationships export the second candidate image if the first relationship matches with the second relationship for " connection ".
Optionally, the relationship established between candidate image is specifically as follows:
Extract the relationship between the candidate entity and candidate entity in candidate text;
According to the relationship between candidate image associated by the candidate entity of relationship foundation between candidate entity.Such as, it extracts Relationship between candidate entity " soy bean milk making machine ontology " and " head " is " connection ", establishes candidate entity " soy bean milk making machine ontology " and candidate Relationship between entity " head " is connection relationship.
Optionally, the relationship extracted between the candidate entity in candidate text and candidate entity can specifically include following step It is rapid:
By candidate text input to entity extraction model, identify that the candidate in candidate text is real by entity extraction model Body;
The candidate text input for having identified candidate entity is extracted into model to relationship, it is candidate to extract model output by relationship Relationship between entity.Specifically, extracting model extraction candidate by entity extraction model extraction candidate's entity, and by relationship Relationship between entity can not repeat herein refering to step 202 and step 203 in embodiment 1.
Optionally, target text information is the object construction of structured representation.
Embodiment 5
It please refers to shown in Figure 12, the embodiment of the present application also provides a kind of method for obtaining entity information, this method applications In a kind of electronic equipment, which can be server, or terminal, the executing subject tool in the embodiment of the present application The not restriction of body.In order to better understand the present embodiment, the word in the present embodiment is illustrated first:
It should be noted that " incidence relation " and above-described embodiment 1- between entity and entity in the present embodiment Entity in embodiment 4 is identical as " relationship " meaning before entity.Explaining for incidence relation in the embodiment of the present application It is bright, it is also applied for the explanation in above-described embodiment 1- embodiment 4 to " relationship ".
The attribute of incidence relation includes relationship type, and relationship type includes but is not limited to conceptual relation, belonging relation, position Set relationship, ordinal relation and logical relation.
Wherein, conceptual relation: refer to summary and specific relationship, i.e. hyponymy, such as relative to " automobile ", the vehicles Belong to upperseat concept, relative to " bus ", " automobile " belongs to upperseat concept.
The conceptual relation can extract model by relationship and be identified, it is in above-described embodiment which, which extracts model, Relationship extracts model, and optionally, it is further to the right in a large amount of patent text that the relationship in the present embodiment, which extracts model, Claim learn and training obtains, and includes a large amount of upper subordinate concept in claims, for example, connection component includes Screw and nut, connection component are upperseat concept, and screw and nut is subordinate concept, and relationship is extracted model and passed through to a large amount of power The study of sharp claim, which, which extracts model, can identify the hyponymy in text between entity.
Belonging relation: including but not limited to inclusion relation, connection relationship and coordination.
1) inclusion relation: is defined according to inclusion relation, and upper entity includes junior's entity, under level assembly as above includes Level assembly is relationship between superior and subordinate relationship between automobile and wheel if automobile includes wheel.
2) connection relationship: there is connection relationship, such as " pedestal " connection " LED light ", between pedestal and LED light between entity Relationship is connection relationship.
3) coordination: having coordination between entity, such as " soy bean milk making machine " includes " upper cover " and " lower cover ", " upper cover " and There is no inclusion relation between " lower cover ", also without connection relationship, " upper cover " and " lower cover " is arranged side by side, i.e. " upper cover " and " lower cover " Between relationship be coordination.
Ordinal relation: there is sequencing relationship between entity.For example, step 1: receiving the first signal;Step 2: to letter It number is handled, obtains second signal.The first signal and the second signal have the sequence in step, i.e. the first signal is first, the Binary signal is rear, then the first signal and the second signal have temporal ordinal relation, and " the first signal " and " second signal " are Ordinal relation.
Positional relationship: refer to relationship spatially, such as inside and outside, left, right, front and rear.Such as, " LED light " is set to " bottom On seat ", " LED light " and " pedestal " has positional relationship.
Logical relation: using an entity as benchmark position in the logic statement of natural language, in the preset model of the entity Lookup at least one entity in enclosing, at least one entity in the entity and presetting range of the base position is logical relation. For example, in a logic of natural language statement: a kind of double-layer lower cover soy milk grinder, including cup body and head, head are located at cup On body, head includes a upper cover and mutually covers the lower cover of conjunction with the upper cover, and motor and control circuit are fixedly installed on head, electricity Arbor extends down into the cup body below motor room, and motor shaft ends are equipped with crushing knife tool.It is with " motor " in the text Base position, forward or backward g character, such as the g are 10, then using motor as benchmark position, 10 characters, are found separately forward One entity " head ", 10 characters, find " control circuit " and " motor shaft " backward, then " head ", " control circuit " and " electricity Arbor " and " motor " are logical relation.
It please refers to shown in Figure 12, a kind of method of the acquisition entity information provided in the embodiment of the present application may include as follows Step:
Step 601 receives target text information;Wherein, target text information includes first object entity.
If executing subject is terminal, terminal receives the target text information of user's input.If the executing subject is server, The then target text information that server receiving terminal is sent.For example, the target text information is " engine ".It should in the present embodiment Executing subject can be illustrated by taking server as an example.In an application scenarios, e.g., terminal receives " starting for user's input Machine ", terminal send the target entity to server, which receives the target text information.It should be noted that this Shen Please the quantity of first object entity in embodiment do not limit, it is only exemplary that the target entity, which is " engine ", in this example Illustrate, does not cause the limited explanation to the application.
Step 602 retrieves the first candidate entity to match with first object entity in data set;Wherein, data set Comprising the relationship between candidate entity, and candidate entity, candidate entity is real including at least the first candidate entity and with the first candidate Body has the second candidate entity of incidence relation.
The data set, which can be, to be pre-established, and is then stored the data set, alternatively, the data set is also possible to It is obtained from another equipment.It is illustrated below to how establishing the data set:
Obtain candidate text collection;Wherein, candidate text collection can be patent text set, candidate's text collection packet More candidate texts are included, every candidate text includes candidate entity;It is extracted in model extraction every candidate text by relationship Candidate entity, then by relationship extract model extraction candidate text in relationship, obtain candidate entity and its between pass System.According to the incidence relation between candidate entity, and candidate entity, data set is obtained.
If first object entity is " soy bean milk making machine ", first to match in data set with first object entity is candidate real Body, if the first candidate entity is " soy bean milk making machine ontology ";In data set, have with the first candidate entity " soy bean milk making machine ontology " The candidate entity " upper cover " of the second of incidence relation.It should be noted that the incidence relation in the embodiment of the present application includes above-mentioned Belonging relation, conceptual relation, ordinal relation and logical relation.
Such as second candidate entity can be " upper cover ", i.e., the first candidate entity and the second candidate entity are belonging relation (inclusion relation), the second candidate entity are the time for having conceptual relation, ordinal relation or logical relation with the first candidate entity Entity is selected, different one is illustrated herein.
It should be noted that first object entity can be tied with the first candidate specific matching process of entity in this step The step 503 closed in above-described embodiment 4 is understood, is not repeated herein.
Step 603 selects the second candidate entity for having incidence relation with the first candidate entity in data set.
The second candidate entity with the first candidate entity with incidence relation is selected in the data set, for example, " on Lid " is inclusion relation with the first candidate entity, and " motor " is logical relation with the first candidate entity, and " cap assemblies " first are candidate real Body is conceptual relation etc., herein a different citing.
Step 604, the candidate entity of output second.
Server sends the second candidate entity to terminal, the terminal display second candidate entity.It, should in the present embodiment The quantity of second candidate entity does not limit, and the incidence relation between the second candidate entity and the first candidate entity is not yet It limits.
In an application scenarios, when user needs to improve the dependency structure of soy bean milk making machine, user can be inputted " soy bean milk making machine ", terminal receive user input " soy bean milk making machine ", and should " soy bean milk making machine " be sent to server, server should " beans Pulp grinder " is matched with the candidate entity in data set, is somebody's turn to do " soy bean milk making machine " and is matched with candidate entity " soy bean milk making machine ontology ", determines There is the second candidate entity of incidence relation with " the soy bean milk making machine ontology ", which is sent to terminal by server, The candidate entity of terminal display multiple second, can show the multiple second candidate entities in the form of a list.
In the embodiment of the present application, target text information is received;Wherein, target text information includes first object entity;? The the first candidate entity to match with first object entity is retrieved in data set;Candidate entity include at least the first candidate entity and There is the second candidate entity of incidence relation with the first candidate entity;Then selection has with the first candidate entity in data set The candidate entity of the second of incidence relation;The candidate entity of output second.In the present embodiment, it can be pushed away automatically according to first object entity It recommends out and has the related second candidate entity with the first object entity, avoid user by retrieval, text is divided piece by piece Analysis, so that the mode of the candidate entity of selection second, is greatly saved cost of labor.
Optionally, on the basis of the above embodiments, the attribute of incidence relation includes relationship type, and target text information is also Including relationship by objective (RBO) condition, relationship by objective (RBO) condition is used to indicate the relation object between target entity and candidate entity to be obtained Type;The relationship by objective (RBO) condition can be specific character express, such as: it include to connect, bottom etc.."comprising" indicates target Relationship type between entity and candidate entity to be obtained is belonging relation;" connection " indicates target entity and time to be obtained Selecting the relationship type between entity is belonging relation, and " bottom " indicates the relationship between target entity and candidate entity to be obtained Type is conceptual relation.Optionally, which can also be indicated with mark, for example, " bh " expression includes " lj " Indicate " connection " etc..
In above-mentioned steps 603, select have the second candidate of incidence relation real with the first candidate entity in data set The specific steps of body can be with are as follows:
The meet the type of relationship by objective (RBO) condition second candidate entity is selected in data set according to the first candidate entity.
For example, the target text information includes first object entity " soy bean milk making machine ", relationship by objective (RBO) condition is "comprising", then The the second candidate entity for meeting "comprising" relationship according to first candidate entity " soy bean milk making machine ontology " selection in data set, as this Two candidate entities can be " motor ", " upper cover " and " lower cover " etc..
In the present embodiment, which can also include relationship by objective (RBO) condition further can be according to first Candidate entity selects the meet the type of relationship by objective (RBO) condition second candidate entity in data set, increases applicable scene.
Optionally, the second candidate entity that there is incidence relation with the first candidate entity is selected specifically may be used also in data set To include:
Select have the second candidate of incidence relation real with the first candidate entity in data set according to the first candidate entity The candidate entity of multiple the second of body;
According to presetting rule from the multiple second candidate entities the candidate entity of selection target second, target second is candidate real Body is as the second candidate entity.
In a kind of mode that may be implemented, determine that each second candidate entity is in data set in the multiple second candidate entities The frequency of middle appearance;For example, the multiple second candidate entity is " motor ", " upper cover " and " lower cover " etc..Wherein, " motor " is in number According to concentrating the frequency occurred to be greater than thresholding, alternatively, should " motor " frequency for occurring in data set in all second candidate entities In rank the first position.
According to the frequency from the multiple second candidate entities the candidate entity of selection target second, the candidate entity of target second is made For the second candidate entity.For example, can choose " motor " is the candidate entity of target second.
In another implementation, time belonging to each second candidate entity in the multiple second candidate entities can be determined The relevant date of selection sheet, the relevant date include but is not limited to date of application, submission date and publication date, and multiple Different text belonging to two candidate entities;
According to relevant date from the multiple second candidate entities the candidate entity of selection target second, target second is candidate real Body is as the second candidate entity.The relevant date is illustrated by taking publication date as an example, according to publication date apart from current date The candidate entity of selection target second from the multiple second candidate entities of sequence from the near to the distant.For example, patent belonging to " motor " The publication date of text is 2018.6.3, and the publication date of patent text belonging to " upper cover " is 2017.5.4, belonging to " lower cover " Patent text publication date be 2017.1.4, then can choose the publication date nearest from current date it is corresponding second wait Selecting entity is the candidate entity of target second.It should be noted that being intended merely to the in this present embodiment multiple second candidate entities Facilitate explanation and for example, do not cause the limited explanation to the application.
Optionally, on the basis of the above embodiments, the attribute of incidence relation further includes relationship dimension, and relationship dimension includes Binary crelation, or, binary crelation, to X member relationship, X is the integer more than or equal to 3, binary crelation includes two entities and two Relationship between a entity, X member relationship include X entity, at least (X-1) a binary crelation, and (X-1) a binary crelation passes through Associated entity connection.
Optionally, on the basis of the above embodiments, the quantity of the second candidate entity is multiple, and target text information is also Including the second target entity and relationship by objective (RBO) condition, select have the second of incidence relation with the first candidate entity in data set Candidate entity can also specifically include:
The the multiple second candidate entities to match with the second target entity are retrieved in data set;
Selection meets the candidate entity of target second of relationship by objective (RBO) condition from the multiple second candidate entities;
Export R member relationship group;Wherein, R is the integer more than or equal to 2, and less than or equal to N, R member relationship group packet Multiple R member relationships are included, each R member relationship includes the first candidate entity, the candidate entity of target second and the first candidate entity and mesh Relationship between the candidate entity of mark second.
For example, first object entity is " engine ", the second target entity is " connecting rod ", and relationship by objective (RBO) condition is " even Connect ", the first candidate entity is " engine " and " engine " etc., and retrieval matches multiple with the second target entity in data set Second candidate entity, which can be " upper connecting rod ", " lower link " and " link assembly " etc., the R member relationship group Can be binary crelation group and/or ternary relation group, in the present embodiment, the R member relationship group can by taking binary crelation group as an example into Row explanation, e.g., which includes: binary crelation 1 (engine connection upper connecting rod), (engine connects binary crelation 2 under connecting Bar), binary crelation 3 (engine connection link assembly) etc..It can be according to first object entity, the second target entity in the present embodiment And its relationship between first object entity and the second target entity, automatically retrieval go out R member relationship group and export.
Optionally, entity include component, and/or, attribute, and/or, attribute value.
Target entity includes target element, objective attribute target attribute, and/or, Target Attribute values;Candidate entity include candidate component, Candidate attribute, and/or, candidate value, candidate entity and the candidate textual association belonging to it, for example, candidate text is patent Text, every patent text have the patent No., candidate's entity can by the patent No. with its belonging to candidate text close Connection.Method can also include:
Respectively by target element and each candidate component, objective attribute target attribute and each candidate attribute, and/or, Target Attribute values It is matched with each candidate value;For example, target element is " motor ", objective attribute target attribute is " voltage ", the Target Attribute values For " 220V ".
The determining target candidate component to match with target element, target candidate attribute, and/or, target candidate attribute value;
The first candidate text with target candidate component liaison is obtained, the second candidate text with target candidate Attribute Association This, and/or, with the associated third candidate text of target candidate attribute value;The first candidate text, the second candidate text and third The quantity of candidate text does not limit, it may for example comprise the candidate text of the first of " motor " has 100, second including " voltage " Candidate text has 80, and third candidate's text including " 220V " has 80.This 100 first candidate texts, 80 the second times Can have in identical candidate text, such as candidate text XX in selection sheet and 80 third candidate's texts and include " motor ", " electricity Pressure " and " 220 ", the i.e. first candidate text, the second candidate text and third candidate's text may be the same or different.It needs It is noted that the quantity explanation merely for convenience of text candidate for first, the second candidate text and third candidate's text And for example, not to the application being defined property explanation.
The candidate text of output first, the second candidate text, and/or, third candidate's text.
Specifically, the first candidate text, the second candidate text are exported by way of list, and/or, the candidate text of third This, user can check comprising " motor ", " voltage ", " 220V " candidate text, so that user is checked in detail including target The detailed description of content in candidate text belonging to component, objective attribute target attribute and/or Target Attribute values.
Optionally, on the basis of the above embodiments, the data set includes candidate relationship, and the candidate relationship includes extremely Relationship between few two candidate entities and at least two candidate entity, the target text information includes relationship by objective (RBO), The relationship by objective (RBO) includes at least the relationship between two target entities and the target entity, and described two target entities include The first object entity and the second target entity;
The second candidate entity for selecting that there is incidence relation with the described first candidate entity in the data set Step can also specifically include:
The target candidate entity to match with second target entity, the target candidate are retrieved in the data set Entity and the first candidate entity have incidence relation;For example, relationship by objective (RBO) include first object entity be " lid ", second Relationship (" packet between target entity " upper cover " and the first object entity (" lid ") and second target entity (" upper cover ") Containing " relationship).Retrieved in data set match with the second target entity (" upper cover ") target candidate entity (such as " upper cover " or " upper end cover " or " upper cover body " etc., specific quantity do not limit), each target candidate entity and first object entity (such as lid) All there is incidence relation (such as inclusion relation).
The first candidate relationship comprising the target candidate entity is searched according to the candidate relationship, wherein described first Candidate relationship further includes the relationship between third candidate entity and the target candidate entity and the third candidate entity;Number It include a large amount of candidate relationship according to concentrating, each candidate relationship can contain at least two between candidate entity, and candidate entity Relationship;Further according to candidate relationship a large amount of in data set, search comprising the target candidate entity (such as " upper cover " or " upper end cover " or " upper cover body ") the first candidate relationship, in order to briefly explain, the target candidate entity by taking " upper end cover " as an example into Row explanation, which includes target candidate relationship and third candidate entity (e.g., button, display screen etc.), for example, should First candidate relationship can be with are as follows: (upper end cover setting button) or (upper end cover setting display screen) etc..It should be noted that first Incidence relation in candidate relationship between target candidate entity and the first candidate relationship does not limit, and such as can be setting, connection, Comprising etc..
Further, in the first implementation, using the first candidate relationship as the described second candidate entity output, such as It exports (upper end cover setting button), server sends first candidate relationship to terminal, and terminal is according to first candidate relationship Show first candidate relationship, i.e. displaying (upper end cover setting button).In an application scenarios, if technical staff's input (lid Body includes upper cover), server can automatically recommend out component associated with the relationship by objective (RBO), i.e., can be set in " upper cover " " display screen " can be set on " button ", or " upper cover ", there is great reference value for technological improvement to technical staff.? In second of possible implementation, the third candidate entity can also be exported.I.e. directly output third candidate entity (is pressed Button or display screen).
In the third possible implementation, it is real that the candidate similar with the third candidate entity can also be searched Body, it is similar between two entities by being determined documented by step 303 in embodiment 1 by the semantic vector of two entities Degree, does not repeat herein, and selection is greater than the candidate entity of threshold value with the similarity of the third candidate entity, for example, with the third The similar candidate entity of candidate entity is " key ", and the directly output candidate entity similar with third candidate's entity " is pressed Key ".
It optionally, in the fourth possible implementation, can also be candidate by the third according to the candidate relationship Entity is matched with the candidate entity that each candidate relationship is included, determining to wait with the third candidate entity matches the 4th Select entity;For example, the 4th time that third candidate's entity is " button " and the third candidate entity (such as " button ") matches Select entity (such as key).
It is candidate comprising the 4th using the second candidate relationship comprising the described 4th candidate entity as the described second candidate entity Second candidate relationship of entity can be (key is set to operation panel), export second candidate relationship, can open up in terminal The content shown are as follows: lid includes upper cover, upper cover setting button, and key is set to operation panel, and optionally, the content of the displaying can To be the text of structuring, alternatively, the image of structuring.In an application scenarios, if (lid includes upper to technical staff's input Lid), server can automatically recommend out component associated with the relationship by objective (RBO), i.e., " button " can be set in " upper cover ", Or " key " can be set in " upper cover ", " key " is set on " operation panel ", and server is to the recommendation of entity to technology people Member has great reference value for technological improvement.
Optionally, in a fifth possible implementation, the target text information includes relationship by objective (RBO), the target Relationship includes at least the relationship between two target entities and the target entity, and described two target entities include described first Target entity and the second target entity;It is described to select that there is incidence relation with the described first candidate entity in the data set Second candidate entity can also specifically include:
The target candidate entity to match with second target entity, the target candidate are retrieved in the data set Entity and the first candidate entity have incidence relation;For example, relationship by objective (RBO) include first object entity be " lid ", second Relationship (" packet between target entity " upper cover " and the first object entity (" lid ") and second target entity (" upper cover ") Containing " relationship).Retrieved in data set match with the second target entity (" upper cover ") target candidate entity (such as " upper cover " or " upper end cover " or " upper cover body " etc., specific quantity do not limit), each target candidate entity and first object entity (such as lid) All there is incidence relation (such as inclusion relation).
Optionally, in a fifth possible implementation, it is searched according to candidate relationship and is had with the target candidate entity The candidate entity of relevant the 5th, the described 5th candidate entity are contained in third candidate relationship, wherein the third is waited Selecting relationship includes the described 5th candidate entity, the 6th candidate entity and the 5th candidate entity and the 6th candidate entity Between relationship;The 5th candidate that there is incidence relation with the target candidate entity (upper end cover) is such as searched according to candidate relationship Entity (soy bean milk making machine ontology), the described 5th candidate entity are contained in third candidate relationship, which can be (upper end cover connect soy bean milk making machine ontology), alternatively, the third candidate relationship may be (soy bean milk making machine ontology includes lower cover), this Six candidate entities can be identical as target candidate entity, can also be different.
Further, using the third candidate relationship as the described second candidate entity output.In an application scenarios, If technical staff's input (lid includes upper cover), server can automatically recommend out candidate pass associated with the relationship by objective (RBO) System, the content that can be shown such as terminal are as follows: lid includes upper cover, and upper end cover connects soy bean milk making machine ontology, under soy bean milk making machine ontology includes End cap is alternatively, lid includes upper cover, and soy bean milk making machine ontology connects pedestal, and upper cover and soy bean milk making machine ontology have the relationship of connection.Originally show In example, according to relationship by objective (RBO), server can recommend the relationship for having incidence relation with the relationship by objective (RBO), enhance applicable field Scape, server have great reference value for technological improvement to the recommendation of relationship.
Optionally, in a sixth possible implementation, it is determined according to the candidate relationship candidate comprising the third 4th candidate relationship of relationship;Such as, the 4th candidate relationship are as follows: (upper end cover connects soy bean milk making machine ontology, and soy bean milk making machine ontology connects bottom Seat), further, using the 4th candidate relationship as the described second candidate entity output.In an application scenarios, if skill Art personnel input (lid includes upper cover), server can automatically recommend out candidate relationship associated with the relationship by objective (RBO), such as The content that terminal can be shown are as follows: lid includes upper cover, and upper end cover connects soy bean milk making machine ontology, and soy bean milk making machine ontology includes lower cover. In this example, according to relationship by objective (RBO), server can recommend the relationship for having incidence relation with the relationship by objective (RBO), enhance applicable Scene, server have great reference value for technological improvement to the recommendation of relationship.
It should be noted that, for candidate relationship, relationship by objective (RBO), candidate entity is all exemplary illustration in the present embodiment, The limited explanation to the application is not caused.
Optionally, on the basis of the above embodiments, data set further includes image data set, and image data set includes multiple Candidate image, each candidate image in multiple candidate images have associated candidate entity, the selection and first in data set After the candidate entity of candidate entity tool related second, method further include:
According to the second candidate entity lookup image data set, candidate image associated with second candidate's entity is determined, it will The candidate image of second candidate entity is as the second candidate entity.
For example, the second candidate entity is " upper connecting rod " and " lower link " in an application scenarios, according to second candidate Entity lookup image data set determines candidate image associated with " upper connecting rod " and " lower link ", by the image of " upper connecting rod " The image of " lower link " is as the second candidate entity output.
In the present embodiment, the candidate image of the available second candidate entity directly exports the candidate of the second candidate entity Image, enhances the vividness of the second candidate entity, and image information is easier to user and understands the second candidate entity.
Optionally, on the basis of the above embodiments, it is illustrated below to how establishing image data set:
In one implementation, image data set includes the first image data set, according to the second candidate entity lookup figure As data set, before determining candidate image associated with second candidate's entity, method further include:
Candidate text collection is obtained, candidate text collection includes more candidate texts, and every candidate text includes candidate Entity;
Count the frequency that each candidate entity occurs in candidate text collection;
High frequency entity, high frequency entity are determined according to the frequency are as follows: the frequency of appearance is higher than the entity of thresholding, alternatively, high frequency is real Body are as follows: the entity after being ranked up according to the frequency, before preset position;
By at least one the corresponding candidate image of each high frequency entity associated, the first image data set is obtained.
In the second implementation, image data set includes the second image data set, according to the second candidate entity lookup Image data set, before determining candidate image associated with second candidate's entity, method further include:
Candidate text collection is obtained, every candidate text in candidate text collection includes Detailed description of the invention and attached drawing, attached drawing Illustrate the mark comprising candidate entity and candidate entity, attached drawing includes candidate image and mark;
The incidence relation that candidate entity and candidate image are established according to mark, obtains the second image data set.
In the third implementation, image data set includes third image data set, according to the second candidate entity lookup Image data set, before determining candidate image associated with second candidate's entity, method further include:
Candidate text collection is obtained, every candidate text in candidate text collection includes title and Figure of abstract;
Extract the Figure of abstract in candidate text;
Identify the candidate entity in title;
The incidence relation for establishing candidate entity and Figure of abstract, obtains third image data set.
In the present embodiment, which includes the first image data set, the second image data set and/or third image The method of data set, first image data set, the second image data set and third image data set specifically established can join It reads and establishes the specific method of image data in above-described embodiment 4 and understood.
Optionally, it is illustrated below to how searching image data set:
Image data set includes the first image data set, and the first image data set includes the candidate image of high frequency entity, high Frequency entity is the candidate entity that frequency of usage is higher than thresholding;
It will be according to second the first image data set of candidate entity lookup;
If candidate image associated with second candidate's entity is not found in the first image data concentration, according to second Other image data set (such as the second image data set and/or the thirds of candidate entity lookup other than the first image data set Image data set).
Firstly, candidate entity associated by target entity with the first image data is concentrated each candidate image matches; It, can be first by target entity and height because concentrating the candidate entity for including in the first image data is the higher entity of frequency of occurrence Frequency entity matches, to improve rate matched.
If target entity is not matched to candidate entity in the first image data concentration, by target entity and in addition to the first figure As other image datas except data set concentrate candidate entity associated by each candidate image to be matched.If the target entity It is not matched to candidate entity in the first image data concentration, then by target entity and the second image data set and/or third image Candidate entity associated by each candidate image is matched in data set.If target entity is matched in the first image data concentration The associated candidate image of candidate's entity is then directly sent to terminal by candidate entity, so that terminal display candidate's entity. In the embodiment of the present application, target entity is first matched with the first image data set, matched rate is improved.
Embodiment 6 please refers to shown in Figure 13, and the embodiment of the present application provides the one of the device of a kind of determining text novelty degree A embodiment, the device are used to execute the practical method and step executed of electronic equipment in above-described embodiment 3, the device 1300 include:
Text determining module 1301, for determining target text;
Entity extraction module 1302, for extracting multiple mesh in the target text that the text determining module 1301 determines Entity is marked, target entity set is obtained;
Entity obtains module 1303, obtains the candidate entity sets of every candidate text in candidate text collection;
Entity intersection determining module 1304, the target entity collection extracted for determining the entity extraction module 1302 The first instance intersection that the candidate entity sets that module 1303 obtains are obtained with entity is closed, the first instance intersection is institute State the entity to match in target entity set and the candidate entity sets;
Novel degree determining module 1305, the first instance intersection for being determined according to the entity intersection determining module 1304 The difference parameter for the target entity set extracted with the entity extraction module 1302 determines the target text and the candidate The novel degree of text.
It please refers to shown in Figure 14, on the basis of Figure 13 corresponding embodiment, the embodiment of the present application provides a kind of determination Another embodiment of the device 1400 of text novelty degree, the device further include relationship extraction module 1306, Relation acquisition module 1307 and relationship intersection determining module 1308;
Relationship extraction module 1306 obtains target binary pass for extracting multiple binary crelations in the target text Assembly close, the binary crelation include two entities and its between relationship;
Relation acquisition module 1307, for obtaining the candidate binary relationship including multiple binary crelations in the candidate text Set;
Relationship intersection determining module 1308 is also used to determine the target binary crelation that the relationship extraction module 1306 extracts First binary crelation intersection of the candidate binary set of relationship that set is obtained with the Relation acquisition module 1307, the described 1st First relationship intersection includes the binary crelation to match in the target binary crelation set and the candidate binary set of relationship;
Novel degree determining module 1305, also particularly useful for:
First instance novelty degree is determined according to the difference parameter of the first instance set and the target entity set;
The first binary is determined according to the difference parameter of the first binary crelation intersection and the target binary crelation set Relationship novelty degree;
According to the first instance novelty degree and the first binary crelation novelty degree determine the target text with it is described The novel degree of candidate text.
Optionally, relationship extraction module 1306 is also used to extract the target ternary relation set in the target text, institute Stating target ternary relation set includes multiple ternary relations, and the ternary relation includes two binary crelations, described two binary Entity having the same in relationship;
Relation acquisition module 1307 is also used to obtain the candidate ternary including multiple ternary relations in the candidate text and closes Assembly is closed;
Relationship intersection determining module 1308 is also used to determine the target ternary relation that the relationship extraction module 1306 extracts First ternary relation intersection of the candidate ternary relation set that set is obtained with the Relation acquisition module 1307, the described 1st First relationship intersection includes the ternary relation to match in the target ternary relation set and the candidate ternary relation set;
Novel degree determining module 1305, also particularly useful for:
The first ternary is determined according to the difference parameter of the first ternary relation intersection and the target ternary relation set Relationship novelty degree;
According to the first instance novelty degree, the first binary crelation novelty degree and the first ternary relation novelty degree Determine the novel degree of the target text and the candidate text.
Optionally,
Entity extraction module 1302 is also used to the target text being input to entity extraction model, passes through the entity It extracts model and identifies multiple target entities in the target text.
Optionally, relationship extraction module 1306 is also used to for the target text for having recognized the target entity being input to Relationship extracts model, extracts the binary crelation between target entity described in model extraction by the relationship.
It optionally, further include generation module 1309;
Generation module 1309, target entity and relationship extraction module for being extracted according to the entity extraction module 1302 Relationship between 1306 target entities extracted carries out structured representation to the target text, generates object construction.
Optionally, object construction includes node and side, and for the node for indicating the target entity, the side is used for table Show the relationship between target entity.
Optionally, candidate structure of the every candidate text for structuring, the target text in the candidate text collection For object construction;Entity extraction module 1302, is also used to extract the candidate entity sets of candidate map, and candidate's map includes An at least candidate structure;
Entity intersection determining module 1304 is also used to determine the target entity set that entity extraction module 1302 is extracted The second instance intersection of the candidate entity sets for the candidate map that module 1303 obtains is obtained with the entity;
Novel degree determining module 1305 is also used to the difference according to the second instance intersection and the target entity set Parameter determines the novel degree of the target text and the candidate map.
Optionally, when the candidate map includes at least two candidate structures, at least two candidate structures are the One candidate structure and the second candidate structure;The device further includes associated entity determining module 1310 and relating module 1311;
Associated entity determining module 1310, for determining the association of first candidate structure and second candidate structure Entity;
Relating module 1311, the associated entity for being determined by the associated entity determining module 1310 is by described first Candidate structure and second candidate structure are associated, and obtain the candidate map.
Optionally, relationship extraction module 1306 is also used to extract multiple binary crelations in the object construction, obtains mesh Mark binary crelation set;
Novel degree determining module 1305, also particularly useful for:
Two target entities that each target binary crelation in the target binary crelation set is included are navigated to Corresponding two provider locations in candidate's map;
Calculate the distance between corresponding described two provider locations of each target binary crelation;
Second binary crelation of each target binary crelation relative to the candidate map is determined according to the distance Novel degree;
Second instance novelty degree is determined according to the difference parameter of the second instance set and the target entity set;
According to the second instance novelty degree and the second binary crelation novelty degree determine the object construction with it is described The novel degree of candidate map.
Optionally,
Relation acquisition module 1307 is also used to obtain the candidate binary including multiple binary crelations in the candidate map and closes Assembly is closed;
Relationship intersection determining module 1308 is also used to determine the target binary crelation set and the candidate binary relationship Second binary crelation intersection of set;
Novel degree determining module 1305, also particularly useful for:
The first binary is determined according to the difference parameter of the second binary crelation intersection and the target binary crelation set Relationship novelty degree;
It is true according to the first binary crelation novelty degree and the second binary crelation novelty degree and corresponding weight Determine binary crelation novelty degree;
The object construction and the candidate are determined according to the second instance novelty degree and the binary crelation novelty degree The novel degree of map.
Optionally,
Relationship extraction module 1306 is also used to extract multiple ternary relations in the object construction, obtains target ternary Set of relationship;
Novel degree determining module 1305, also particularly useful for:
Any two target entity that each target ternary relation in the target ternary set is included is navigated to Three provider locations of correspondence in candidate's map;
Calculate the distance between any two provider locations in three provider locations;
Second ternary relation of each target ternary relation relative to the candidate map is determined according to the distance Novel degree;
According to the determination of the second instance novelty degree, the second binary crelation novelty degree and the second ternary relation novelty degree The novel degree of object construction and the candidate map.
Optionally,
Relation acquisition module 1307 is also used to obtain the candidate ternary including multiple ternary relations in the candidate map and closes Assembly is closed;
Relationship intersection determining module 1308 is also used to determine the target ternary relation set and the candidate ternary relation Second ternary relation intersection of set;
Novel degree determining module 1305, also particularly useful for:
The first ternary relation is determined according to the difference parameter of the ternary relation intersection and the target ternary relation set Novel degree;
It is true according to the first ternary relation novelty degree and the second ternary relation novelty degree and corresponding weight Determine ternary relation novelty degree;
The target is determined according to the second instance novelty degree, binary crelation novelty degree and the ternary relation novelty degree The novel degree of structure and the candidate map.
Please refer to Figure 15, the embodiment of the present application also provides a kind of electronic equipment 70, electronic equipment 70 include: memory 710, Transceiver 720 and processor 730.Skilled artisans will appreciate that electronic equipment can also include other components, such as counting Common various assemblies in calculation machine.The intercommunication of memory 710, transceiver 720 and processor 730, memory 710 are used for Computer instruction is stored, for being communicated with other devices, computer instruction makes transceiver 720 when processor 730 executes Electronic equipment 70 executes method described in above-mentioned each method embodiment.
The embodiment of the present application also provides a kind of computer storage mediums, for storing computer software instructions, it includes For executing method performed by electronic equipment in embodiment of the method.
It is that can lead to it will be understood by those skilled in the art that realizing all or part of the process in above-described embodiment method Computer program is crossed to instruct relevant hardware and complete, the program can be stored in a computer-readable storage medium In, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, the storage medium can for magnetic disk, CD, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), flash memory (Flash Memory), hard disk (Hard Disk Drive, abbreviation: HDD) or solid-state it is hard Disk (Solid-State Drive, SSD) etc.;The storage medium can also include the combination of the memory of mentioned kind.
Although being described in conjunction with the accompanying the embodiment of the present invention, those skilled in the art can not depart from the present invention Spirit and scope in the case where various modifications and variations can be made, such modifications and variations are each fallen within by appended claims institute Within the scope of restriction.

Claims (16)

1. a kind of method of determining text novelty degree characterized by comprising
Determine target text;
Multiple target entities in the target text are extracted, target entity set is obtained;
Obtain the candidate entity sets of every candidate text in candidate text collection;
Determine the first instance intersection of the target entity set and the candidate entity sets, the first instance intersection is institute State the entity to match in target entity set and the candidate entity sets;
The target text and the time are determined according to the difference parameter of the first instance intersection and the target entity set The novel degree of selection sheet.
2. the method according to claim 1, wherein the method also includes:
Multiple binary crelations in the target text are extracted, target binary crelation set is obtained, the binary crelation includes two A entity and its between relationship;
Obtain the candidate binary set of relationship including multiple binary crelations in the candidate text;
Determine the first binary crelation intersection of the target binary crelation set Yu the candidate binary set of relationship, described first Binary crelation intersection includes the binary crelation to match in the target binary crelation set and the candidate binary set of relationship;
The difference parameter according to the first instance set and the target entity set determines the target text and institute State the novel degree of candidate text, comprising:
First instance novelty degree is determined according to the difference parameter of the first instance set and the target entity set;
The first binary crelation is determined according to the difference parameter of the first binary crelation intersection and the target binary crelation set Novel degree;
The target text and the candidate are determined according to the first instance novelty degree and the first binary crelation novelty degree The novel degree of text.
3. according to the method described in claim 2, it is characterized in that, the method also includes:
The target ternary relation set in the target text is extracted, the target ternary relation set includes that multiple ternarys are closed System, the ternary relation include two binary crelations, entity having the same in described two binary crelations;
Obtain the candidate ternary relation set including multiple ternary relations in the candidate text;
Determine the first ternary relation intersection of the target ternary relation set and the candidate ternary relation set, described first Ternary relation intersection includes the ternary relation to match in the target ternary relation set and the candidate ternary relation set;
It is described according to the first instance novelty degree and the first binary crelation novelty degree determine the target text with it is described The novel degree of candidate text, comprising:
The first ternary relation is determined according to the difference parameter of the first ternary relation intersection and the target ternary relation set Novel degree;
It is determined according to the first instance novelty degree, the first binary crelation novelty degree and the first ternary relation novelty degree The novel degree of the target text and the candidate text.
4. the method according to claim 1, wherein the multiple targets extracted in the target text are real Body, comprising:
The target text is input to entity extraction model, is identified in the target text by the entity extraction model Multiple target entities.
5. according to the method described in claim 2, it is characterized in that, the multiple binary extracted in the target text are closed System, comprising:
The target text for having recognized the target entity is input to relationship and extracts model, model is extracted by the relationship and is mentioned Take the binary crelation between the target entity.
6. according to the method described in claim 5, it is characterised by comprising:
According to the relationship between the target entity, structured representation is carried out to the target text, generates object construction.
7. according to the method described in claim 6, the node is used it is characterized in that, the object construction includes node and side In indicating the target entity, the side is used to indicate the relationship between target entity.
8. the method according to claim 1, wherein every candidate text is structure in candidate's text collection The candidate structure of change, the target text are object construction, the method also includes:
The candidate entity sets of candidate map are extracted, candidate's map includes an at least candidate structure;
The entity intersection of the determination target entity set and the candidate entity sets, comprising:
Determine the second instance intersection of the candidate entity sets of the target entity set and the candidate map;
The method also includes:
The target text and the time are determined according to the difference parameter of the second instance intersection and the target entity set Select the novel degree of map.
9. according to the method described in claim 8, it is characterized in that, when the candidate map includes at least two candidate structures When, at least two candidate structures are the first candidate structure and the second candidate structure;
Determine the associated entity of first candidate structure and second candidate structure;
First candidate structure and second candidate structure are associated by the associated entity, obtain the candidate Map.
10. according to the method described in claim 8, it is characterized in that, the method also includes:
Multiple binary crelations in the object construction are extracted, target binary crelation set is obtained;
Two target entities that each target binary crelation in the target binary crelation set is included are navigated to described Corresponding two provider locations in candidate map;
Calculate the distance between corresponding described two provider locations of each target binary crelation;
Determine that each target binary crelation is novel relative to the second binary crelation of the candidate map according to the distance Degree;
The difference parameter according to the second instance set and the target entity set determines the target text and institute State the novel degree of candidate text, comprising:
Second instance novelty degree is determined according to the difference parameter of the second instance set and the target entity set;
The object construction and the candidate are determined according to the second instance novelty degree and the second binary crelation novelty degree The novel degree of map.
11. according to the method described in claim 10, it is characterized in that, the method also includes:
Obtain the candidate binary set of relationship including multiple binary crelations in the candidate map;
Determine the second binary crelation intersection of the target binary crelation set Yu the candidate binary set of relationship;
The first binary crelation is determined according to the difference parameter of the second binary crelation intersection and the target binary crelation set Novel degree;
Two are determined according to the first binary crelation novelty degree and the second binary crelation novelty degree and corresponding weight First relationship novelty degree;
It is described according to the second instance novelty degree and the second binary crelation novelty degree determine the object construction with it is described The novel degree of candidate structure, comprising:
The object construction and the candidate map are determined according to the second instance novelty degree and the binary crelation novelty degree Novel degree.
12. according to the method for claim 11, which is characterized in that the method also includes:
Multiple ternary relations in the object construction are extracted, target ternary relation set is obtained;
Any two target entity that each target ternary relation in the target ternary set is included is navigated to described Three provider locations of correspondence in candidate map;
Calculate the distance between any two provider locations in three provider locations;
Determine that each target ternary relation is novel relative to the second ternary relation of the candidate map according to the distance Degree;
It is described according to the second instance novelty degree and the second binary crelation novelty degree determine the object construction with it is described The novel degree of candidate structure, comprising:
The target is determined according to the second instance novelty degree, the second binary crelation novelty degree and the second ternary relation novelty degree The novel degree of structure and the candidate map.
13. according to the method for claim 12, which is characterized in that the method also includes:
Obtain the candidate ternary relation set including multiple ternary relations in the candidate map;
Determine the second ternary relation intersection of the target ternary relation set and the candidate ternary relation set;
The first ternary relation novelty is determined according to the difference parameter of the ternary relation intersection and the target ternary relation set Degree;
Three are determined according to the first ternary relation novelty degree and the second ternary relation novelty degree and corresponding weight First relationship novelty degree;
The object construction and the candidate map are determined according to the second instance novelty degree and the binary crelation novelty degree Novel degree, comprising:
The object construction is determined according to the second instance novelty degree, binary crelation novelty degree and the ternary relation novelty degree With the novel degree of the candidate map.
14. a kind of device of determining text novelty degree characterized by comprising
First determining module, for determining target text;
Extraction module is obtained for extracting multiple target entities in the target text that first determining module determines Target entity set;
Module is obtained, for obtaining the candidate entity sets of every in candidate text collection candidate text;
Second determining module, the target entity set and the acquisition module for determining the extraction module identification obtain The candidate entity sets first instance intersection, the first instance intersection is the target entity set and the candidate The entity to match in entity sets;
Novelty determining module, the first instance intersection and the extraction mould for being determined according to second determining module The difference parameter for the target entity set that block extracts determines the novel degree of the target text and the candidate text.
15. a kind of electronic equipment characterized by comprising
Memory and processor;
Connection is communicated with each other between the memory and the processor, and computer instruction is stored in the memory, it is described Processor is by executing the computer instruction, thereby executing method of any of claims 1-13.
16. a kind of computer storage medium, which is characterized in that the computer-readable recording medium storage has computer instruction, The computer instruction is used to that the computer perform claim to be made to require method described in any one of 1-13.
CN201811348626.6A 2018-11-13 2018-11-13 Method and related device for determining text novelty Active CN109582933B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811348626.6A CN109582933B (en) 2018-11-13 2018-11-13 Method and related device for determining text novelty

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811348626.6A CN109582933B (en) 2018-11-13 2018-11-13 Method and related device for determining text novelty

Publications (2)

Publication Number Publication Date
CN109582933A true CN109582933A (en) 2019-04-05
CN109582933B CN109582933B (en) 2021-09-03

Family

ID=65922365

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811348626.6A Active CN109582933B (en) 2018-11-13 2018-11-13 Method and related device for determining text novelty

Country Status (1)

Country Link
CN (1) CN109582933B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111144709A (en) * 2019-12-06 2020-05-12 北京邮电大学 Method and device for determining novelty of machine-generated text
CN111708873A (en) * 2020-06-15 2020-09-25 腾讯科技(深圳)有限公司 Intelligent question answering method and device, computer equipment and storage medium
CN111930898A (en) * 2020-09-18 2020-11-13 北京合享智慧科技有限公司 Text evaluation method and device, electronic equipment and storage medium
CN112052835A (en) * 2020-09-29 2020-12-08 北京百度网讯科技有限公司 Information processing method, information processing apparatus, electronic device, and storage medium
CN113743087A (en) * 2021-09-07 2021-12-03 珍岛信息技术(上海)股份有限公司 Text generation method and system based on neural network vocabulary extension paragraphs
CN115879441A (en) * 2022-11-10 2023-03-31 中国科学技术信息研究所 Text novelty detection method and device, electronic equipment and readable storage medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110202545A1 (en) * 2008-01-07 2011-08-18 Takao Kawai Information extraction device and information extraction system
US20130232160A1 (en) * 2012-03-02 2013-09-05 Semmle Limited Finding duplicate passages of text in a collection of text
CN104636325A (en) * 2015-02-06 2015-05-20 中南大学 Document similarity determining method based on maximum likelihood estimation
CN105653706A (en) * 2015-12-31 2016-06-08 北京理工大学 Multilayer quotation recommendation method based on literature content mapping knowledge domain
CN106815293A (en) * 2016-12-08 2017-06-09 中国电子科技集团公司第三十二研究所 System and method for constructing knowledge graph for information analysis
JP2017123168A (en) * 2016-01-05 2017-07-13 富士通株式会社 Method for making entity mention in short text associated with entity in semantic knowledge base, and device
CN107015961A (en) * 2016-01-27 2017-08-04 中文在线数字出版集团股份有限公司 A kind of text similarity comparison method
CN107665252A (en) * 2017-09-27 2018-02-06 深圳证券信息有限公司 A kind of method and device of creation of knowledge collection of illustrative plates
WO2018153295A1 (en) * 2017-02-27 2018-08-30 腾讯科技(深圳)有限公司 Text entity extraction method, device, apparatus, and storage media
CN108763566A (en) * 2018-06-05 2018-11-06 北京玄科技有限公司 Text similarity computing method and device, intelligent robot
CN108763569A (en) * 2018-06-05 2018-11-06 北京玄科技有限公司 Text similarity computing method and device, intelligent robot

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110202545A1 (en) * 2008-01-07 2011-08-18 Takao Kawai Information extraction device and information extraction system
US20130232160A1 (en) * 2012-03-02 2013-09-05 Semmle Limited Finding duplicate passages of text in a collection of text
CN104636325A (en) * 2015-02-06 2015-05-20 中南大学 Document similarity determining method based on maximum likelihood estimation
CN105653706A (en) * 2015-12-31 2016-06-08 北京理工大学 Multilayer quotation recommendation method based on literature content mapping knowledge domain
JP2017123168A (en) * 2016-01-05 2017-07-13 富士通株式会社 Method for making entity mention in short text associated with entity in semantic knowledge base, and device
CN107015961A (en) * 2016-01-27 2017-08-04 中文在线数字出版集团股份有限公司 A kind of text similarity comparison method
CN106815293A (en) * 2016-12-08 2017-06-09 中国电子科技集团公司第三十二研究所 System and method for constructing knowledge graph for information analysis
WO2018153295A1 (en) * 2017-02-27 2018-08-30 腾讯科技(深圳)有限公司 Text entity extraction method, device, apparatus, and storage media
CN107665252A (en) * 2017-09-27 2018-02-06 深圳证券信息有限公司 A kind of method and device of creation of knowledge collection of illustrative plates
CN108763566A (en) * 2018-06-05 2018-11-06 北京玄科技有限公司 Text similarity computing method and device, intelligent robot
CN108763569A (en) * 2018-06-05 2018-11-06 北京玄科技有限公司 Text similarity computing method and device, intelligent robot

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ALIYA NUGUMANOVA等: "A new text representation model enriched with semantic relations", 《2015 15TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS)》 *
赵夷平 等: "关联数据在学术资源网相似文献发现中的应用研究", 《现代图书情报技术》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111144709A (en) * 2019-12-06 2020-05-12 北京邮电大学 Method and device for determining novelty of machine-generated text
CN111144709B (en) * 2019-12-06 2023-04-18 北京邮电大学 Method and device for determining novelty of machine-generated text
CN111708873A (en) * 2020-06-15 2020-09-25 腾讯科技(深圳)有限公司 Intelligent question answering method and device, computer equipment and storage medium
CN111708873B (en) * 2020-06-15 2023-11-24 腾讯科技(深圳)有限公司 Intelligent question-answering method, intelligent question-answering device, computer equipment and storage medium
CN111930898A (en) * 2020-09-18 2020-11-13 北京合享智慧科技有限公司 Text evaluation method and device, electronic equipment and storage medium
CN112052835A (en) * 2020-09-29 2020-12-08 北京百度网讯科技有限公司 Information processing method, information processing apparatus, electronic device, and storage medium
US11908219B2 (en) 2020-09-29 2024-02-20 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and device for processing information, electronic device, and storage medium
CN113743087A (en) * 2021-09-07 2021-12-03 珍岛信息技术(上海)股份有限公司 Text generation method and system based on neural network vocabulary extension paragraphs
CN113743087B (en) * 2021-09-07 2024-04-26 珍岛信息技术(上海)股份有限公司 Text generation method and system based on neural network vocabulary extension paragraph
CN115879441A (en) * 2022-11-10 2023-03-31 中国科学技术信息研究所 Text novelty detection method and device, electronic equipment and readable storage medium
CN115879441B (en) * 2022-11-10 2024-04-12 中国科学技术信息研究所 Text novelty detection method and device, electronic equipment and readable storage medium

Also Published As

Publication number Publication date
CN109582933B (en) 2021-09-03

Similar Documents

Publication Publication Date Title
CN109582800A (en) The method and relevant apparatus of a kind of training structure model, text structure
CN109597878A (en) A kind of method and relevant apparatus of determining text similarity
CN109582933A (en) A kind of method and relevant apparatus of determining text novelty degree
CN111061946B (en) Method, device, electronic equipment and storage medium for recommending scenerized content
CN112100529B (en) Search content ordering method and device, storage medium and electronic equipment
KR20180041200A (en) Information processing method and apparatus
KR20170001550A (en) Human-computer intelligence chatting method and device based on artificial intelligence
CN113505204B (en) Recall model training method, search recall device and computer equipment
CN110019650B (en) Method and device for providing search association word, storage medium and electronic equipment
CN109635277A (en) A kind of method and relevant apparatus obtaining entity information
CN113254711B (en) Interactive image display method and device, computer equipment and storage medium
CN110134885A (en) A kind of point of interest recommended method, device, equipment and computer storage medium
CN109145083B (en) Candidate answer selecting method based on deep learning
CN114691831A (en) Task-type intelligent automobile fault question-answering system based on knowledge graph
CN103927339B (en) Knowledge Reorganizing system and method for knowledge realignment
CN115129883B (en) Entity linking method and device, storage medium and electronic equipment
CN116662495A (en) Question-answering processing method, and method and device for training question-answering processing model
CN117271818B (en) Visual question-answering method, system, electronic equipment and storage medium
CN117786068A (en) Knowledge question-answering method, device, equipment and readable storage medium
CN109635139A (en) A kind of method and relevant apparatus obtaining image information
CN117009599A (en) Data retrieval method and device, processor and electronic equipment
CN116561339A (en) Knowledge graph entity linking method, knowledge graph entity linking device, computer equipment and storage medium
CN115269961A (en) Content search method and related device
CN114637855A (en) Knowledge graph-based searching method and device, computer equipment and storage medium
CN114443916A (en) Supply and demand matching method and system for test data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant