CN109635277B

CN109635277B - Method and related device for acquiring entity information

Info

Publication number: CN109635277B
Application number: CN201811348686.8A
Authority: CN
Inventors: 郭永红; 姜庭欣
Original assignee: Beijing Incopat Co ltd
Current assignee: Beijing Incopat Co ltd
Priority date: 2018-11-13
Filing date: 2018-11-13
Publication date: 2023-05-26
Anticipated expiration: 2038-11-13
Also published as: CN109635277A

Abstract

The embodiment of the application provides a method for acquiring entity information and a related device, wherein the method comprises the following steps: receiving target text information; the target text information comprises a first target entity; retrieving a first candidate entity in the dataset that matches the first target entity; the data set comprises candidate entities and relations among the candidate entities, wherein the candidate entities at least comprise a first candidate entity and a second candidate entity which has an association relation with the first candidate entity; selecting a second candidate entity with an association relationship with the first candidate entity in the data set; and outputting the second candidate entity. In this embodiment, the second candidate entity having a relationship with the first target entity may be automatically recommended according to the first target entity, so that the user is prevented from analyzing the text by text through retrieval, and thus the labor cost is greatly saved.

Description

Method and related device for acquiring entity information

Technical Field

The invention relates to the field of data processing, in particular to a method for acquiring entity information and a related device.

Background

In the information age today, users have become a routine way in daily work and study life by retrieving the required text information. As the skilled person thinks of improvements to the current technology, it may be necessary to retrieve a large amount of text information, such as patents, academic papers, technical journals, etc. The retrieved text information is then understood and learned, thereby improving current technology. For example, the user needs to improve the AA device, which includes an AA entity, and consider that setting or connecting one other entity on the AA entity can solve the problem existing in the current AA device.

In the prior art, a user can search through keywords, for example, AA equipment comprises AA entities, the user can search through AA entities serving as keywords, and the search result obtained through the search is all texts (for example, patent texts) comprising AA entities. In the large amount of text retrieved, the user needs to analyze each text, manually analyze each text, find aa related content, and greatly waste labor cost.

Disclosure of Invention

In view of this, the embodiment of the present invention provides a method and a related device for obtaining entity information, where in this embodiment, a second candidate entity having a relationship with a first target entity may be automatically recommended according to the received first target entity, so that a user is prevented from searching and analyzing text by text, thereby greatly saving labor cost.

In a first aspect, an embodiment of the present application provides a method for acquiring entity information, including:

receiving target text information; wherein the target text information comprises a first target entity;

retrieving a first candidate entity in the dataset that matches the first target entity; the data set comprises candidate entities and relations among the candidate entities, wherein the candidate entities at least comprise a first candidate entity and a second candidate entity which has an association relation with the first candidate entity;

Selecting a second candidate entity with an association relationship with the first candidate entity in the dataset;

and outputting the second candidate entity.

In a possible implementation manner, the attribute of the association relationship includes a relationship type, and the target text information further includes a target relationship condition, where the target relationship condition is used to represent the relationship type between the target entity and the candidate entity to be acquired;

the selecting a second candidate entity with an association relation with the first candidate entity in the dataset comprises:

and selecting a second candidate entity of a type meeting the target relation condition in the data set according to the first candidate entity.

In one possible implementation, the relationship type includes at least one of a conceptual relationship, a belonging relationship, a positional relationship, a sequential relationship, and a logical relationship.

In a possible implementation manner, the selecting, in the dataset, a second candidate entity having an association relationship with the first candidate entity includes:

selecting a plurality of second candidate entities of the second candidate entities with association relation with the first candidate entity in the data set according to the first candidate entity;

And selecting a target second candidate entity from the plurality of second candidate entities according to a preset rule, and taking the target second candidate entity as the second candidate entity.

In one possible implementation manner, the selecting a target second candidate entity from the plurality of second candidate entities according to a preset rule, taking the target second candidate entity as the second candidate entity includes:

determining how frequently each of the plurality of second candidate entities occurs in the dataset;

and selecting a target second candidate entity from the plurality of second candidate entities according to the frequency, and taking the target second candidate entity as the second candidate entity.

determining the relevant date of the candidate text of each second candidate entity in the plurality of second candidate entities;

and selecting a target second candidate entity from the plurality of second candidate entities according to the relevant date, and taking the target second candidate entity as the second candidate entity.

In one possible implementation manner, the attribute of the association relationship further includes a relationship dimension, where the relationship dimension includes a binary relationship, or a binary relationship is related to an X-element relationship, where X is an integer greater than or equal to 3, the binary relationship includes two entities and a relationship between the two entities, the X-element relationship includes X entities, and at least (X-1) binary relationships, and the (X-1) binary relationships are connected through the association entities.

In one possible implementation manner, the number of the second candidate entities is a plurality, the target text information further includes a second target entity and a target relationship condition, and the selecting a second candidate entity having an association relationship with the first candidate entity in the dataset includes:

retrieving a plurality of second candidate entities in the dataset that match the second target entity;

selecting a target second candidate entity meeting the target relationship condition from the plurality of second candidate entities;

outputting an R element relation group; wherein R is an integer greater than or equal to 2 and less than or equal to N, the set of R-gram relationships includes a plurality of R-gram relationships, each of the R-gram relationships including the first candidate entity, the target second candidate entity, and a relationship between the first candidate entity and the target second candidate entity.

In one possible implementation, the entity includes a component, and/or an attribute value.

In one possible implementation, the target entity includes a target component, a target attribute, and/or a target attribute value; the candidate entity includes a candidate component, a candidate attribute, and/or a candidate attribute value, the candidate entity being associated with a candidate text to which it belongs, the method further comprising:

respectively matching the target component with each candidate component, the target attribute with each candidate attribute, and/or the target attribute value with each candidate attribute value;

determining a target candidate component, a target candidate attribute and/or a target candidate attribute value matched with the target component;

acquiring a first candidate text associated with the target candidate component, a second candidate text associated with the target candidate attribute, and/or a third candidate text associated with the target candidate attribute value;

outputting the first candidate text, the second candidate text and/or the third candidate text.

In one possible implementation, the dataset further comprises an image dataset comprising a plurality of candidate images, each candidate image of the plurality of candidate images having an associated candidate entity, the method further comprising, after selecting a second candidate entity in the dataset having a relationship to the first candidate entity:

And searching the image data set according to the second candidate entity, determining a candidate image associated with the second candidate entity, and taking the candidate image of the second candidate entity as the second candidate entity.

In one possible implementation, the image dataset includes a first image dataset containing candidate images of high frequency entities, the high frequency entities being candidate entities with a frequency of use above a threshold;

said looking up said image dataset from said second candidate entity, determining a candidate image associated with said second candidate entity, comprising:

searching the first image data set according to the second candidate entity;

if the candidate image associated with the second candidate entity is not found in the first image data set, searching other image data sets except the first image data set according to the second candidate entity.

In a possible implementation, the image dataset includes a first image dataset, and the method further includes, prior to searching the image dataset according to the second candidate entity and determining the candidate image associated with the second candidate entity:

An image dataset is created.

acquiring a candidate text set, wherein the candidate text set comprises a plurality of candidate texts, and each candidate text comprises a candidate entity;

counting the occurrence frequency of each candidate entity in the candidate text set;

determining a high-frequency entity according to the frequency, wherein the high-frequency entity is: an entity whose frequency of occurrence is above a threshold, or the high frequency entity is: after sorting according to the frequency, entities in front of a preset position;

and associating each high-frequency entity with at least one corresponding candidate image to obtain the first image data set.

In a possible implementation, the image dataset includes a second image dataset, and the method further includes, prior to determining the candidate image associated with the second candidate entity from the second candidate entity looking up the image dataset:

acquiring a candidate text set, wherein each candidate text in the candidate text set comprises a drawing description and a drawing, the drawing description comprises a candidate entity and an identification of the candidate entity, and the drawing comprises a candidate image and the identification;

And establishing an association relation between the candidate entity and the candidate image according to the identification to obtain the second image data set.

In a possible implementation, the image dataset includes a third image dataset, and the method further includes, prior to searching the image dataset according to the second candidate entity and determining the candidate image associated with the second candidate entity:

acquiring a candidate text set, wherein each candidate text in the candidate text set comprises a title and a abstract drawing;

extracting the abstract drawing in the candidate text;

identifying candidate entities in the title;

and establishing an association relation between the candidate entity and the abstract drawing to obtain the third image data set.

In one possible implementation, the target text information is a target structure of a structured representation.

In a second aspect, an embodiment of the present application further provides an apparatus for obtaining entity information, including: the receiving module is used for receiving target text information, wherein the target text information comprises a first target entity;

the matching module is used for searching a first candidate entity matched with the first target entity received by the receiving module in a data set, wherein the data set comprises candidate entities and relations among the candidate entities, the candidate entities at least comprise a first candidate entity and a second candidate entity, and the first candidate entity has a relation with the second candidate entity;

A selection module for selecting a second candidate entity in the dataset having a relationship with the first candidate entity;

and the output module is used for outputting the second candidate entity.

In a third aspect, the present application further provides an electronic device, including:

a memory and a processor;

the memory and the processor are communicatively coupled to each other, the memory having stored therein computer instructions that, when executed, cause the processor to perform the method of any of claims 1-16.

A fourth aspect is a computer storage medium, wherein the computer readable storage medium stores computer instructions for causing the computer to perform the method according to the first aspect.

In the embodiment of the application, receiving target text information; the target text information comprises a first target entity; retrieving a first candidate entity in the dataset that matches the first target entity; the candidate entity at least comprises a first candidate entity and a second candidate entity which has an association relation with the first candidate entity; then selecting a second candidate entity with an association relation with the first candidate entity in the data set; and outputting the second candidate entity. In this embodiment, the second candidate entity having a relationship with the first target entity may be automatically recommended according to the first target entity, so that the user is prevented from analyzing the text by searching, and the labor cost is greatly saved.

Drawings

The features and advantages of the present invention will be more clearly understood by reference to the accompanying drawings, which are illustrative and should not be construed as limiting the invention in any way, in which:

FIG. 1 is a flow chart illustrating the steps of one embodiment of a method for training a structured model according to an embodiment of the present application;

FIG. 2 is a flow chart illustrating steps of one embodiment of a method for text structuring according to the embodiments of the present application;

FIG. 3 is a schematic diagram of a target structure in an embodiment of the present application;

FIG. 4 is a schematic diagram of an image structure in an embodiment of the present application;

FIG. 5 is a flowchart illustrating steps of one embodiment of a method for determining text similarity according to embodiments of the present application;

FIG. 6 is a diagram illustrating a Word2vec model training process in an embodiment of the present application;

FIG. 7 is a flow chart illustrating steps of one embodiment of a method for determining text novelty in accordance with embodiments of the present application;

FIG. 8 is a schematic diagram of a candidate map in an embodiment of the present application;

FIG. 9 is a flowchart illustrating steps of one embodiment of a method for acquiring image information according to embodiments of the present application;

FIG. 10 is a schematic illustration of the drawings and the accompanying drawings in the candidate text in an embodiment of the present application;

FIG. 11 is a schematic topology diagram of a first candidate image and a second candidate image according to an embodiment of the present application;

FIG. 12 is a flowchart illustrating steps of one embodiment of a method for obtaining entity information according to the present disclosure;

FIG. 13 is a schematic structural diagram of an embodiment of an apparatus for obtaining entity information according to the embodiments of the present application;

fig. 14 is a schematic structural diagram of another embodiment of an apparatus for acquiring entity information according to an embodiment of the present application;

fig. 15 is a schematic structural diagram of an embodiment of an electronic device in an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention.

The text in the embodiment of the application includes, but is not limited to, technical literature, patent literature, academic papers and the like, and after the text is structured, structured information (for example, a structure diagram) is obtained to help a user understand the content of the text. Alternatively, structured information may be used as a search formula for searching information, and, taking patent documents as an example, the current search methods for patents are mostly based on text search, and text search focuses on matching of text characters, lacks understanding of user requirements and understanding of patent content, and does not search on the basis of content understanding. The patent text is represented in a structured manner by the method provided by the embodiment of the application, so that the retrieval can be performed on the basis of understanding the patent content, and the retrieval accuracy is improved.

The embodiment of the application provides a text structuring method, which is applied to electronic equipment, wherein the electronic equipment can be a server or terminal equipment, and the terminal equipment comprises but is not limited to a computer, a mobile phone, a palm computer and the like. The electronic equipment acquires a target text to be structured, for example, the target text can be a patent, then the target text is input into a trained entity extraction model, and an entity in the target text is identified through the entity extraction model; then inputting the target text of the identified entity into a trained relation extraction model, and extracting the relation between the entities through the relation extraction model; the target text is structured according to the entities and the relation between the entities, and structured information (or structured text representation) is generated, for example, the structured text representation may be a structure diagram or a flow chart, etc. In the embodiment of the application, the entities in the target text are extracted through the trained entity extraction model, then the relationships among the entities are extracted through the trained relationship extraction model, and the structured text representation is automatically generated according to the relationships among the entities, so that the understanding of text content is facilitated, the conversion speed is high, and the labor cost is saved.

For the sake of understanding, the text structuring method provided in the embodiments of the present application first explains the words provided in the embodiments of the present application:

entity: words used to represent features in text (e.g., patents, papers), in technical documents such as patents, papers, the entity is a word used to represent a technical feature, and the entity includes components, attributes, or attribute values.

And (3) assembly: representing the constituent elements in the text, such as the charging device, the memory.

Attributes: representing an attribute of the component, such as the "voltage" of the charging device.

Attribute value: the value representing an attribute of the component, such as the voltage of the charging device, is "240v".

Relationship between entities: the relation between technical features, in particular, includes the relation between the components, the relation between the components and the attributes, or the relation between the attributes and the attribute values.

Wherein 1) the types of relationships between components include, but are not limited to:

the inclusion relation, for example, the charging pile includes a control unit.

The connection relation is, for example, that the humidity adjusting device is connected with the refrigerating fan.

2) Relationship of components to attributes:

the component has certain properties, such as the charging device has voltage properties.

3) Relationship of component's attributes to attribute values:

the attribute has a specific attribute value, such as voltage "yes" 240v.

Example 1

As will be understood with reference to fig. 1, a method for text structuring provided in an embodiment of the present application is described in detail below, where the method for text structuring mainly includes two parts, a first part is a training structural model, and a second part is to make a structural representation of text.

Firstly, training a structured model;

the structured model comprises an entity extraction model for extracting entities and a relation extraction model for extracting the entities, and the training method comprises the following steps:

step 101, obtaining a marked first corpus, wherein the first corpus is obtained by marking entity corpus of each text in a first text set according to a first preset rule.

The first text set includes, but is not limited to, technical literature, patents, academic papers, etc., and in the embodiment of the present application, the first text set is described by taking patents as examples. For example, the first set of text may include tens of thousands of patents, it being noted that the number of patents included in the first set of text is by way of example only and not by way of limitation.

The first corpus is obtained by labeling entity corpus of each text in the first text set according to a first preset rule. The first preset rule is: and distinguishing and labeling the first vocabulary representing the entity and the second vocabulary representing the non-entity.

Specifically, the description will be given by taking part of the content of one patent in the first text set as an example:

the text is: "a car high-order stop lamp, its characterized in that: the text labeling device comprises a rectangular mounting seat board (1), wherein a housing frame (2) matched with the mounting seat board is arranged on the mounting seat board (1), a plurality of partition boards (3) are arranged in the housing frame (2), and the text labeling device labels corpus into the following format:

a high-order brake lamp of an automobile comprises a rectangular/pre-installation/start-installation/in-seat/in-plate/end (/ after 1), wherein the/pre-installation/start-installation/in-seat/in-plate/end (/ after 1) is provided with a matched/pre-outer/start-shell/in-frame/end (/ after 2), a plurality of/pre-separation/start-plate/end (3) are arranged in the/pre-outer/start-shell/in-frame/end (/ after 2), and each/pre-separation/start-plate/end is provided with a/pre-shaft/end "/after

The first preset rule specifically includes: the first identifier (e.g.:/start) represents the first word of the entity, the second identifier (e.g.:/end) represents the last word of the entity, and the third identifier (e.g.:/in) represents the word of the component between the first identifier start and the second identifier end. The fourth identifier (e.g.:/entity) represents that the component has only one word. The fifth identifier (e.g.:/pre) represents the word before the first identifier start. The sixth identifier (e.g.:/after) represents all words of the word after the second identifier end except the entity name, and is given a uniform seventh identifier (e.g.:/w).

For example: include/w moment/w shape/w/pre-amp/start mount/in board/end (/ after 1/w)/w.

It should be noted that, in the embodiments of the present application, the labels for corpus labeling are merely examples, and do not limit the descriptions of the embodiments of the present application.

Step 102, training the first corpus set to obtain an entity extraction model.

Training the first corpus using a conditional random field (Conditional Random Field, CRF) model to obtain model parameters, and constructing the entity extraction model according to the model parameters.

The CRF can label Chinese characters, namely words (group words) are formed by characters, so that the frequency information of the occurrence of the words and the phrases of the characters is considered, and the context is considered, so that the CRF has better learning ability, and has good effect on the recognition of ambiguous words and unregistered words.

And 103, taking the second text set as the input of the entity extraction model, and identifying entity information in the second text set through the entity extraction model.

The second set of text is also a set of patents. And taking the second text set as the input of the entity extraction model, and identifying entity information in the second text set through the entity extraction model.

For example, part of the content of one patent in the second set of text is:

the utility model provides a battery monitoring management device, includes group battery (1), monitoring module (2), CPU treater (3) and display (4), uses entity extraction model to analyze this section characters, obtains:

a/w type/w electric/w pool/w monitoring/w measuring/w tube/w management/w arrangement/w placement/w,/w package/w include/w electric/start pool/in group/end (/ w 1/w)/w,/w monitoring/start measurement/in mode/in block/end (/ w 2/w)/w,/w CPU/start location/in management/in device/end (/ w 3/w)/w and/w display/start display/in device/end (/ w 4/w)/w

Four component names are extracted from the example of the text above: the device comprises a battery pack, a monitoring module, a CPU processor and a display.

Step 104, obtaining a second corpus set which is marked, wherein the second corpus set is obtained by marking relation corpus and entity marking each text of the second text set according to a second preset rule.

After the entity extraction model finishes component extraction, relation corpus labeling is carried out, and the relation corpus labeling is converted into a corpus format of a CRF model for training.

The second preset rule is as follows: and distinguishing and labeling the first vocabulary representing the entity, the third vocabulary representing the relation and the third vocabulary representing the non-entity and the non-relation.

Specifically, the following is illustrative:

examples: the mounting seat board (1) is provided with a shell frame (2) matched with the mounting seat board

And carrying out relation corpus standard on the text, and marking as follows:

"the said/w mounting seat board/e (/ w 1/w)/w up/w set/r_start has/r_end and/w phase/w match/w housing frame/e (/ w 2/w).

The seventh identifier (for example,:/w) is a common character, the eighth identifier (for example,:/e) is a component identified by the entity extraction model, the ninth identifier (for example,:/r_start) represents a beginning word of the relationship, and the ninth identifier (for example,:/r_end) represents an ending word of the relationship.

In the embodiment of the present application, the entity extraction model identifies the relationship between entities, and in the example of the embodiment of the present application, the entity extraction model identifies the component by way of example only, and the entity extraction model may also identify the attribute and the attribute value, which is not illustrated in the embodiment, so that the examples shown in the embodiment of the present application do not limit the description of the present application.

And 105, training the second corpus information set to obtain the relation extraction model.

Training the second corpus information set by using a CRF model to obtain model parameters, and constructing the relation extraction model according to the model parameters. The model parameters comprise regularization term parameters a and take a value L2, so that a better fitting effect than L1 can be obtained. The hyper-parameter c can take a value of 3 and can be fitted to training data as much as possible. And f, taking a value of 3, and if the number of times of word occurrence is smaller than f, not participating in training.

For example, the relationship between the entities is extracted from the text as follows: the "mounting plate" is provided with a "housing frame".

In the embodiment of the application, a marked first corpus is obtained by marking entity corpus of each text in a first text set according to a first preset rule; training the first corpus set to obtain an entity extraction model, wherein the entity extraction model is used for extracting entities in texts; then, taking a second text set as input of the entity extraction model, and identifying entity information in the second text set through the entity extraction model; acquiring a marked second corpus set; training the second corpus information set to obtain the relation extraction model, wherein the relation extraction model is used for extracting the relation between the entities, and the relation between the entities is used for carrying out structural representation on the text.

On the basis of the foregoing embodiment, the entity extraction model in the embodiment of the present application includes at least two entity extraction sub-models, where the at least two entity extraction sub-models include a first entity extraction sub-model and a second entity extraction sub-model, and training the first corpus to obtain the entity extraction model may further specifically include:

Training the first corpus set to obtain the first entity extraction sub-model;

using a third text set as input of the first entity extraction sub-model, and identifying a target entity set in the third text set through the first entity extraction sub-model;

training the target entity set to obtain the second entity extraction sub-model.

In this embodiment, it is not necessary to prepare an entity dictionary in advance, only a certain amount of corpus (such as a first corpus set) needs to be labeled to train a first entity extraction sub-model, then a target entity set in a third text set is identified through the first entity extraction sub-model, the target entity set can be used as a new labeled corpus, then the target entity set is trained to obtain a second entity extraction sub-model, the second entity extraction sub-model can cover more entities, and thus an entity dictionary is generated, the entity dictionary can contain more and more entities through the identification of the entity extraction sub-models, for example, entity vocabularies extracted in all patents are summarized together to form the entity dictionary, and the entity dictionary can include 2 columns and the entity+frequency. Frequency is the number of patents that contain this component. For example, a mounting base, 3; and a housing frame 4. According to the embodiment of the application, a certain amount of entity corpus is marked, the entity extraction submodels are continuously trained, more entities are covered through the plurality of entity extraction submodels, and the accuracy of identifying the entities in the text is greatly improved.

Similarly, the relationship extraction model in the embodiment of the present application includes at least two relationship extraction sub-models, where the at least two entity extraction sub-models include a first relationship extraction sub-model and a second relationship extraction sub-model, and training the second corpus information set to obtain the relationship extraction model may further specifically include:

training the second corpus set to obtain the first relation extraction sub-model;

using a fourth text set as input of the first relation extraction sub-model, and identifying a target relation set in the fourth text set through the first relation extraction sub-model;

training the target relation set to obtain the second entity extraction sub-model.

In this embodiment, it is not necessary to prepare an entity relationship dictionary in advance, only a certain amount of relationship corpus (such as a second corpus set) needs to be labeled to train a first relationship extraction sub-model, then a target relationship set in a fourth text set is identified through the first relationship extraction sub-model, the target relationship set can be used as a new labeled relationship corpus, then the target relationship set is trained to obtain a second relationship extraction sub-model, the second relationship extraction sub-model can cover more relationships, and a relationship dictionary is generated from the second relationship extraction sub-model, and through identification of a plurality of relationship extraction sub-models, the relationship dictionary can contain more and more relationships, for example, the relationship vocabularies extracted in all patents are summarized together to form the relationship dictionary, and the relationship dictionary can include 2 columns, namely the relationship and the frequency. Frequency is the number of patents that contain this relationship. For example, include, 10; is provided with 20. According to the method and the device, a certain amount of relation corpus is marked, the relation extraction sub-model is trained continuously and iteratively, more relations are covered by the relation extraction sub-model, and the relation accuracy in the identification text is greatly improved.

Then carrying out text structured representation;

executing steps 101-105 in the above example to obtain an entity extraction model and a relationship extraction model, and further may perform structural representation on the target text through the entity extraction model and the relationship extraction model, as shown in fig. 2, an embodiment of the present application provides a text structuring method, which may include the following steps:

step 201, obtaining a target text to be structured.

The target text to be structured is obtained, which may be a patent, for example.

Step 202, inputting the target text into an entity extraction model, and identifying a target entity set in the target text through the entity extraction model. The entity extraction model is obtained by training the first corpus, and the first corpus is obtained by labeling entity corpus of each text in the first text set.

First, the target text is input into an entity extraction model, and a target entity set in the target text is identified through the entity extraction model. For example, the target text includes the following: "a car high-order stop lamp, its characterized in that: the device comprises a rectangular installation seat board (1), wherein an outer shell frame (2) matched with the installation seat board is arranged on the installation seat board (1), a plurality of partition boards (3) are arranged in the outer shell frame (2), and a target entity set in the target text is output by the entity extraction model and is the installation seat board, the outer shell frame and the partition boards.

And 203, inputting the target text which is identified to the target entity set into a relation extraction model, and extracting the relation between the target entities through the relation extraction model.

Inputting the target text which has been identified to the target entity into a relationship extraction model, the relationship extraction model outputting the relationship between the target entities, for example, the relationship between the entities is: the mounting seat board is provided with a shell frame; the shell frame is provided with a baffle plate.

And 204, carrying out structural representation on the target text according to the relation between the entity and the target entity, and generating a target structure.

Referring to fig. 3, fig. 3 is a schematic diagram of a target structure. The generating target structure comprises nodes and edges, the nodes represent the entity, and the entity comprises components, attributes or attribute values; the edges represent relationships between entities, including relationships between the components, relationships between the components and the attributes, or relationships between the attributes and the attribute values.

For example, the entities extracted from one patent and their relationships are as follows:

the brake lamp comprises a mounting base plate

The brake lamp comprising a grating plate

The brake lamp comprises an LED lamp

The mounting seat board is provided with a shell frame

The shell frame is provided with a baffle plate

The shell frame is provided with a mounting cavity

And fusing the entity extraction result in the target text with the entity relation extraction result to obtain a structure diagram (shown as a target structure in fig. 3) of the whole target text.

According to the method and the device, the entities in the target text are extracted through the trained entity extraction model, the relationships among the entities are extracted through the trained relationship extraction model, the structured text representation is automatically generated according to the relationships among the entities, whether the target text or the candidate text is composed of the entities or the relationships among the entities, the relationships among the entities in the text content are extracted, understanding of the text content is facilitated, the conversion speed is high, and labor cost is saved.

In an application scenario, a user finds a text with a long patent space (such as a patent), or the logic is strong, the user needs a lot of time to understand the content of the patent subjectively, the user can convert the patent into a structure diagram through the electronic equipment (such as a mobile phone), the mobile phone receives the patent, the patent is input into an entity extraction model, and a target entity set in the patent is identified through the entity extraction model; then, inputting the identified patents of the target entity set into a relation extraction model, and extracting the relation between the target entities through the relation extraction model; and carrying out structural representation on the target text according to the target entity and the relation between the target entities, generating a target structure, and displaying the target structure by the terminal. Alternatively, the user may send the patent to a server via a terminal (e.g., a mobile phone), the server converts the patent into a target structure, and then the server sends the target structure to the terminal, which displays the target structure. According to the method and the device for converting the target text into the target structure, the user can understand the content in the target text more easily, and labor cost is saved greatly.

On the basis of the above embodiment, in another embodiment, the situation may occur that the relationship extraction model extracts the relationship between the entities, that is, the target entity may appear in two sentences, so that the relationship extraction model may not be able to identify. For example, in one example, the text to be identified is "battery pack connection monitoring module; and the CPU processor and the display are also connected. By the relation extraction model, the battery pack connection detection module, namely the relation between the battery pack and the detection module, can be identified, and the situation that the battery pack connection detection module cannot be identified possibly exists because the CPU processor and the display are in another sentence.

For the above case, where the relationships between the entities exist in different sentences, there may be cases where the relationship extraction model cannot be identified, another embodiment is provided in the present application:

the target text comprises a first entity, and after step 203, before step 204, the method may further comprise the steps of:

acquiring an entity relation data set, wherein the entity relation data set is obtained by extracting entities in a text set and relations among the entities; the entity relation matrix comprises N entities and relations among the N entities, wherein N is greater than or equal to 2;

And querying in the entity relation data set to obtain M second entities with relation with the first entity, wherein M is smaller than or equal to N.

Searching the second entity in a preset range in the target text;

if at least one target second entity in the M second entities is found, a relation between the first entity and the target second entity is established.

Specifically, firstly, acquiring an entity relation data set, wherein the entity relation data set is obtained by extracting entities in a text set and relations among the entities; the entity relation matrix comprises N entities and relations among the N entities, wherein N is greater than or equal to 2.

The specific method for acquiring the entity relation data set comprises the following steps:

inputting the text set into an entity extraction model, and identifying entity information in the text set through the entity extraction model; the text set may be understood to include a set of multiple texts, for example, a set of hundreds of thousands of patents. It should be noted that the number of texts included in the text set is illustrative, and not a limitation of the embodiments of the present application.

And inputting the target text set which has been identified to the entity information into a relation extraction model, and extracting the relation between the entities in each text in the text set through the relation extraction model. The entity-relationship dataset includes relationships between entities in each text in the set of texts.

The entity relationship dataset is shown in matrix a as follows:

brake lamp

Base seat

……

LED lamp

……

Lamp shell

Brake lamp

0

Is provided with

0

Base seat

0

Included

Connection

……

LED lamp

0

Connection

……

Lamp shell

0

Connection

0

And then, inquiring in the entity relation data set to obtain M second entities with relation with the first entity, wherein M is smaller than or equal to N.

For example, in the target text, the first entity is a "base" which has no relation with other components, and then, in a possible case, the "base" and the components having relation with it are in different sentences, it is required to determine which first entity has relation with which entities in the entity relation data set, and in the target text, the first entity may also have relation with which entities.

For example, the first entity is a "base". The method for searching the second entity related to the base in the matrix A can be as follows:

Locating to a "base" row in matrix A, taking all the component sets S_a related to "base", where S_a contains the components: LED lamp, lamp body. Locating a column of "base" in matrix a, obtaining all component sets s_b related to "base", where s_b contains the following components: brake lamp, lamp housing.

Set s=s_a+s_b, set S contains (s_0, s_1, s_ … s_k … s_n);

in the above example, the set S includes (LED lamp, lamp housing, brake lamp).

Further, searching the second entity in a preset range in the target text;

the preset range may be determined by a size of an entity matching window, and the preset range in the target text is determined according to the size of the entity matching window. The size of the entity matching window may be preset.

Starting from the location where this component appears, the target second entity is looked up in a range within g forward locations and g backward locations. For example, the entity matching window looks for a second entity within 10 characters forward and 10 characters backward from the "base" position.

And finally, if at least one target second entity in the M second entities is found, establishing a relationship between the first entity and the target second entity.

For example, if 3 second entities are found, the 3 second entities are: the brake light, the folding piece and the LED light are matched with the LED light and the brake light in the set S, wherein the LED light and the brake light are target second entities, and a relation between the base and the target second entities is established, and the kind of the relation is the relation.

In this embodiment, an entity relationship data set is obtained, and a query is performed in the entity relationship data set to obtain M second entities having a relationship with the first entity, where M is less than or equal to N; then searching the second entity in a preset range in the target text; if at least one target second entity in the M second entities is found, establishing a relation between the first entity and the target second entity so as to solve the problem that the second entities related to the first entity are respectively in different sentences and the relation extraction model possibly cannot identify.

Optionally, the target structure in the embodiment of the present application may be a text structure or an image structure, and a specific manner of generating the image structure includes:

firstly, acquiring target image information for representing the entity;

Specifically, the image collection can be obtained from internet data (such as various related forums, patent databases, paper databases) and local databases;

identifying text in each image in the set of images; and if the target entity is matched with the characters in the image set, selecting image information for representing the target entity from the image set. For example, identifying text in each image in the set of images, and selecting the first image, the second image, and the third image as image information representing the first target entity and the second target entity if text in the first image (e.g., engine) matches text in the first target entity (e.g., engine), wherein text in the second image (e.g., link) matches text in the second target entity (e.g., link), and wherein text in the third image (e.g., hold-down mechanism) matches text in the second target entity (e.g., hold-down mechanism).

Then, a target structure represented by image information is generated based on the target entity and a relationship between the target entities.

Referring to fig. 4, fig. 4 is a schematic diagram of an image structure. For example, the relationship between "engine", "connecting rod" and "pressing mechanism" is: the "engine" is connected to the "link", "the engine" is connected to the "pressing mechanism", and the image structure shown in fig. 4 is generated from the "engine", "the" link "and the" pressing mechanism "and the connection relationship therebetween. In this example, image information for representing the target entity is obtained, an image structure is generated according to the relationship between the target entity and the target entity, the image structure is displayed, the relationship between each entity and each entity in the text is more vividly represented, and the user can understand the text content more easily.

The method for training the entity extraction model and the relation extraction model is described in detail above, and the entity extraction model and the relation extraction model are applied to carry out structural representation on the text.

It should be noted that, the execution body for executing the steps 101 to 105 and the execution body for executing the steps 201 to 204 may be the same electronic device or may be different electronic devices; steps 101-105 before step 201, when the entity extraction model and the relationship extraction model training are completed, step 201 may be directly performed without performing steps 101-105.

Example 2

Referring to fig. 5, the embodiment of the present application further provides a method for determining text similarity, where the method in this example is applied to an electronic device, and the electronic device may be a server or a terminal, and the method may include the following steps:

step 301, obtaining a target text and a candidate data set, wherein the candidate data set comprises a plurality of arrays, and each array in the plurality of arrays represents a semantic vector of an entity; the entity is included in the candidate text.

The server may receive the target text sent by the terminal, for example, the target text may be a patent.

The specific method for the server to acquire the candidate data set includes at least two modes:

in a first possible implementation:

first, a text set is obtained, where the text set includes n candidate texts, where n is an integer greater than or equal to 2, and it is understood that the text set may be all patents in one technical field in the patent library, or the text set may be a subset of all patents in one technical field in the patent library, for example, where n may be one hundred thousand or millions.

Then, extracting the entity in each candidate text in the n candidate texts to obtain m entities, where it should be noted that, in this step, the specific method for extracting the entity in each candidate text in the n candidate texts may be extracting according to the entity extraction model described in embodiment 1, inputting each candidate text into the entity extraction model, and outputting the entity in each candidate text through the entity extraction model to obtain m entities, where m is an integer greater than or equal to 2, for example, m may be ten millions, two tens of millions, and so on.

Determining a target matrix according to the n candidate texts and the entity contained in each candidate text, for example, the target matrix B is as follows:

	Entity 1	……	Entity j	……	Entity m
						Patent
1	1		0		0
						Patent 2	0	3	4
……
						Patent i	0	1	1
……			0		0
						Patent n	6	1	1

In matrix B, n rows and m columns are included, each of the n rows representing a candidate text and each of the m columns representing an entity. Wherein B [ i ] [ j ] = number of times entity j appears in patent i. For example, entity j appears 3 times in patent 2, entity m appears 1 time in patent i, and so on.

And finally, carrying out singular value decomposition on the target matrix B to obtain a candidate data set.

Specifically, singular value decomposition is performed on the target matrix B as follows:

B＝UΣV ^T ；

a matrix U is obtained, which is a matrix of n rows and k columns, each row representing a vector of text (e.g. a patent).

The matrix Σ is a eigenvalue matrix of the matrix B, k rows and k columns, where k is a specified numerical value, for example, k may be 300.

A matrix V, k rows and m columns, where each column represents a vector of entities, in this example the candidate data set is the matrix V, which may also be referred to as a "candidate matrix".

An example of this matrix V is as follows:

	entity 1	……	Entity j	……	Entity m
						Dimension
1	0.12		-0.1		0.2
						Dimension 2	-0.5	-0.3	0.07
……
						Dimension i	0.01	0.6	0.02
……			-0.08		-0.3
						Dimension k	0.34	0.1	-0.11

Each column in matrix V represents a k-dimensional vector of components, where each value V i j represents the projected value of entity j in the i-th dimension.

In this example, the target matrix B and the matrix V are merely exemplary representations for convenience of description, and do not constitute a limiting description of the present application.

In a second possible implementation:

the candidate data set may be obtained by using a trained Word2vec model, where the candidate data set includes vectors of a plurality of entities, the Word2vec model is obtained by training an entity corpus, and the entity corpus may be obtained by using a method described in step 101 in embodiment 1, or the entity corpus may be obtained by performing entity extraction on each text in a text set by using an entity extraction model, and each Word in the entity corpus is numbered sequentially from 1 to W, where W is an integer greater than 1. The corpus of entities is input to the Word2vec model, and the maximum distance between the current Word and the predicted Word in a sentence may be set to l, for example, l may be 5, 10, etc., and in this example, l may be illustrated by taking 5 as an example. Referring to fig. 6 for understanding, fig. 6 is a schematic diagram of a Word2vec model training process.

The Word2vec model includes an input layer, an intermediate layer, and an output layer.

The input layer has d nodes and corresponds to d entities.

The middle layer is provided with 300 nodes in total, and each input layer node is provided with edges which are all connected with 300 nodes.

And the output layer is provided with d nodes and corresponds to d entities.

Traversing to obtain a sequence number i of t for each entity t in the entity corpus, and inputting layers [ i ] =1 and remaining input layer nodes=0.

Other words within distance 5 of t are obtained, and the numbers a1, a2, a3, a4, a5 of the other words are obtained, the position of the write output layer a 1=1, the position of a 2=1, the position of a 3=1, the position of a 4=1, the position of a 5=1, and the remaining positions=0.

And (5) calling a gradient descent algorithm to calculate the weight of each edge.

After model training is completed, any weight list from the input layer node i to 300 sides of the middle layer node is the vector representing the ith entity. The vector of i entities constitutes the candidate data set.

The candidate data set in this example includes vectors of a plurality of entities. And inputting the entity extracted from each candidate text into the Word2vec model, outputting the vector of each entity through the Word2vec model, and forming the candidate data set by using the vectors of all the obtained entities.

Step 302, extracting a target entity set in the target text, wherein the entity set represented by the multiple arrays of the candidate data set comprises the target entity set.

The embodiment of the application may take the first implementation manner to obtain the candidate data set as an example for explanation. Referring to the example of matrix V in which each column represents an array, each data includes a plurality of elements, each element representing a projected value of an entity in a dimension.

The entity extraction model described in the above embodiment 1 is used to extract a set of target entities in the target text, where the set of target entities includes all target entities in the target text, for example, the target text includes 3 target entities, and the 3 target entities are respectively entity 1 (e.g. a seat board) and entity j (e.g. an LED lamp). The set of entities represented by the plurality of arrays of candidate data sets comprises the target set of entities, e.g. the set of entities represented by the vectors in matrix V (bedplate, …, LED lamp, …, connector) comprises entity 1 and entity j in the target text. In this example, the entity and the number included in the target text and the entity and the number included in the candidate data set are examples for convenience of description, and do not limit the description of the present application.

And 303, determining an included angle value of a vector between each target entity in the target entity set and each entity in each candidate text according to the candidate data set, so as to obtain entity similarity.

And calculating the included angle value of each target entity and the vector of each entity in each candidate text according to the entity vector in the candidate data set. For example, the target entities in the target text are: entity 1 and entity j. The entities in one candidate text c are: and for the candidate text c, calculating the similarity between the entity 1 and the entity 2, the similarity between the entity 1 and the entity x, the similarity between the entity j and the entity 2, and the similarity between the entity j and the entity x.

Taking the similarity between the entity 1 and the entity j as an example, the following description will be given:

in a first possible implementation:

the entity similarity (relay) is the cosine value of the angle between two entity vectors.

For example, rela (entity 1, entity 2) =cosine of the angle between entity 1 vector (V1) and entity 2 vector (V2).

In a second possible implementation: determining a target distance between the end point of the vector of each target entity and the end point of the vector of each entity in each candidate text;

and determining an included angle cosine value (denoted by 'Distance 1') between the semantic vector of each target entity in the target entity set and the semantic vector of each entity in each candidate text according to the candidate data set, and obtaining the entity similarity by the target Distance (denoted by 'Distance 2').

Distance 1=v1 and V2.

Wherein Distance1 is the cosine of the angle between V1 and V2.

Similarity of entity 1 and entity 2, rela (entity 1, entity 2) =distance 1×weight1+distance2×weight2.

Wherein Weight1 represents the Weight of Distance1 and Weight2 represents the Weight of Distance 2. The default values of Weight1 and Weight2 may be 0.5, or may be specified by the user according to the actual usage scenario, for example, weight1 is 0.6 and Weight2 is 0.4.

In this example, the similarity between any two entities is obtained according to the cosine value of the included angle between the two vectors and the target distance of the end point of the two vectors, so that the included angle between the two vectors is considered, the end point position of the two vectors is considered, and the user can determine the weight of the cosine value of the included angle and the target distance according to the actual application scene, thereby improving the accuracy of calculating the similarity between the entities.

And 304, determining the target similarity between the target text and each candidate text according to the entity similarity.

In a first implementation manner, for each candidate text, accumulating the entity similarity of each target entity in the target text to obtain a first accumulated similarity;

And determining the target similarity between the target text and each candidate text according to the first accumulated similarity.

For example, in the above example, entity 1 and entity j, where the entity in one candidate text c is: for the candidate text c, the similarity between the entity 1 and the entity 2 (denoted as "Re 1") is calculated, the similarity between the entity 1 and the entity x (denoted as "Re 2"), the similarity between the entity j and the entity 2 (denoted as "Re 3"), the similarity between the entity j and the entity x (denoted as "Re 4"), and then, for one candidate text, the calculated similarities (Re 1"," Re 2"," Re 3 "and" Re 4 ") with each entity are accumulated to obtain a first accumulated similarity, and in the calculation process, the score with the similarity degree smaller than 50% (without inclusion) is optionally 0. In one implementation, the first accumulated similarity may be used as a similarity of the target text to the candidate text.

Optionally, for each candidate text, a similarity sim1 of each entity in the target text with the candidate text is calculated.

Sim1 = first accumulated similarity/(total number of target text entities U total number of candidate text entities), the Sim1 may be the target similarity of the target text to the candidate text.

In this embodiment, the electronic device obtains a target text and a candidate data set, where the candidate data set includes a plurality of arrays, and each array in the plurality of arrays represents a semantic vector of an entity; the entity is included in a candidate text; further, extracting a target entity set in the target text, wherein the entity set represented by the plurality of arrays of the candidate data set comprises the target entity set; determining an included angle cosine value of a semantic vector of each target entity in the target entity set and a semantic vector of each candidate entity in each candidate text according to the candidate data set to obtain entity similarity; in this embodiment, the similarity between each target entity and each candidate entity in the candidate texts may be calculated, and the target similarity between the target text and each candidate text may be determined according to the entity similarity. In this embodiment, determining the similarity between the target text and the candidate text considers the similarity between each entity in the target text and the candidate text, and the similarity determination can more truly represent the similarity between the content of the target text and the content of the candidate text.

Based on the above example, prior to step 304, the method further comprises the steps of:

extracting a relation between target entities in the target text;

acquiring a candidate relation set in each candidate text; determining the relationship similarity of each relationship in the target relationship set and each candidate relationship in the candidate relationship set according to the entity similarity;

in step 304, the target similarity between the target text and each candidate text is determined according to the entity similarity and the relationship similarity.

The relationship in the embodiment of the present application includes a binary relationship, or a binary relationship to an X-element relationship, where X is an integer greater than or equal to 3, and the binary relationship includes two entities and a relationship between the two entities. The X element relation comprises X entities and at least (X-1) binary relations, wherein each binary relation in the (X-1) binary relations comprises an associated entity, and the (X-1) binary relations are connected with the (X-1) binary relations through the associated entity.

For example, when X is equal to 3, then the relationship includes a binary relationship and a ternary relationship; when X is equal to 4, the relationship includes a binary relationship, a ternary relationship, and a quaternary relationship, and in the embodiment of the present application, the relationship may be described by taking the binary relationship and the ternary relationship as examples for convenience of description.

Binary and ternary relationships are illustrated below:

binary relation: including two entities and the relationship between them, namely entity 1+ entity 2+ entity 1 and entity 2, such as: the brake light (entity 1) comprises a (relationship) base (entity 2).

Ternary relationship: the two binary relations, for example, the binary relation 1 and the binary relation 2, are included, and the two binary relations have the same entity, and the same entity is an association entity and is used for connecting the two binary relations. Such as (brake light-mounting plate, mounting plate-housing frame). Wherein the mounting plate is an associated entity.

The following further describes the method for determining the similarity of the binary relationship and the similarity of the ternary relationship:

optionally, extracting the binary relation between every two target entities in the target text to obtain a target binary relation set of the target text. For example, the target binary relation set is: (brake light-mounting plate, mounting plate-housing frame, housing frame-spacer, housing frame-mounting cavity, brake light-grating, brake light-LED light).

And acquiring a candidate binary relation set in each candidate text. For example, the candidate binary relation set is: (brake lamp-base, base-housing frame, housing frame-dustproof coating, brake lamp-grating, brake lamp-LED lamp, LED lamp-lamp housing).

And determining the binary relation similarity of each binary relation in the target binary relation set and each candidate binary relation in the candidate binary relation set according to the entity similarity. The binary relationship similarity is: similarity of a first target entity in the target binary relationship with a first candidate entity in the candidate binary relationship, similarity of a second target entity in the target binary relationship with a second candidate entity in the candidate binary relationship, and a sum of the similarity of the relationship in the target binary relationship with the relationship in the candidate binary relationship. The formula is: binary relationship similarity Rela2 (target binary relationship, candidate binary relationship) =rela 1 (target entity 1, candidate entity 1) +rela1 (target entity 2, candidate entity 2) +r (target relationship, candidate relationship); if relationship 1 equals relationship 2, then R (relationship 1, relationship 2) =1; if relationship 1 is not equal to relationship 2, r (relationship 1, relationship 2) =0. The following illustrates, for example, the target binary relationship: brake light-mounting plate, candidate binary relation is: brake light-base binary relationship similarity Rela2 (brake light-mounting plate, brake light-base) =rela 1 (brake light ) +rela1 (mounting plate, base) +r (connection ).

Further, accumulating the binary relation similarity of each binary relation in the target text to obtain a second accumulated similarity; the second accumulated similarity is: each binary relation in the target text is traversed in the candidate text, the similarity Relay 2 between the target text and each binary relation is calculated, the score with the entity similarity degree smaller than 50% (without the entity similarity degree) can be recorded as 0, and all the similarity degrees are added.

Further, for each candidate text, the similarity Sim2 between each binary relation in the target text and the candidate structure is calculated. Specifically, a union of the total number of binary relations in the target text and the total number of binary relations in the candidate text is calculated, for example, the total number of binary relations in the target text is 12, the total number of binary relations in the candidate text is 14, the union is 14, and Sim3 is the ratio of the second accumulated similarity to the union, which is as follows:

sim2 = second accumulated similarity/(total number of binary relations in target text U total number of binary relations in candidate text).

Further, on the basis of the above embodiment, the method may further include the following steps:

and determining a target ternary relation set according to the target binary relation set, wherein the target ternary relation set comprises a plurality of ternary relations, the ternary relations comprise two binary relations, and the two binary relations have the same entity. For example, the target triplet sets are: (brake light-mounting plate, mounting plate-housing frame), (mounting plate-housing frame, housing frame-spacer), (mounting plate-housing frame, housing frame-mounting cavity).

And acquiring a candidate ternary relation set in each candidate text. For example, the candidate set of triples is: (brake lamp-base, base-housing frame), (base-housing frame, housing frame-dustproof coating).

And determining the ternary relationship similarity of each ternary relationship in the target ternary relationship set and each candidate ternary relationship in the candidate ternary relationship set according to the binary relationship similarity. The ternary relationship similarity is: the binary relationship similarity between the first target binary relationship in the target ternary relationship and the first candidate binary relationship in the candidate ternary relationship and the sum of the binary relationship similarity between the second target binary relationship in the target ternary relationship and the second candidate binary relationship in the candidate ternary relationship can be expressed in the following manner:

relay 3 (target triplet, candidate triplet) =Relay 2 (first target triplet, first candidate triplet) +Relay 2 (second target triplet, second candidate triplet).

For example, the target ternary relationship is: (brake light-mounting plate, mounting plate-housing frame);

the candidate ternary relationship is: (brake light-base, base-housing frame);

relay 3[ (brake light-mounting plate, mounting plate-housing frame), (brake light-base, base-housing frame) ]

Rela2 (brake light-mounting plate, brake light-base) +Rela2 (mounting plate-housing frame, base-housing frame)

=rel1 (brake light ) +rel1 (mounting plate, base) +r (connection ) +rel1 (mounting plate, base) +rel1 (housing frame ) +r (connection, connection).

Accumulating the ternary relationship similarity of each ternary relationship in the target text to obtain a third accumulated similarity; and traversing each ternary relation with the third accumulated similarity as the target text in the candidate text once, calculating the similarity Relay 3 of each candidate ternary relation, and adding all the similarities, wherein the score of the entity similarity less than 50% (without inclusion) is 0.

And calculating the similarity Sim3 between each ternary relation in the target text and the candidate text. Specifically, a union of the total number of ternary relations in the target text and the total number of ternary relations in the candidate text is calculated, for example, the total number of ternary relations in the target text is 10, the total number of ternary relations in the candidate text is 8, the union is 10, and sim3 is the ratio of the third accumulated similarity to the union, which is as follows:

sim3 = third accumulated similarity/(total number of triples in target text U total number of triples in candidate text).

Further, the target entity comprises a specific entity, and the method further comprises:

determining entity similarity of the specific entity; the specific entity may be an entity designated by a user, and the number of specific entities is not limited. For example, the specific entity is "brake light", or the specific entity may be "brake light" and "mounting base plate", which may be an entity important in the actual technical solution, and in this example, the specific entity may be illustrated as "brake light". For example, candidate entities included in the candidate text are "brake light", "base" and "lamp house", for which the entity similarity of a particular entity includes: the similarity between the "brake light" and the "brake light" (denoted as "R11"), "the similarity between the" brake light "and the" base "(denoted as" R12 ")," the similarity between the "brake light" and the "lamp housing" (denoted as "R13").

Accumulating the entity similarity of the specific entity for each candidate text to obtain a fourth accumulated similarity; the fourth accumulated similarity is: r11+r12+r13.

In the step 304, the similarity SIM between the target text and the candidate text is calculated according to the first accumulated similarity, the second accumulated similarity, the third accumulated similarity, the fourth accumulated similarity, and the weights corresponding to the first accumulated similarity, the second accumulated similarity, the third accumulated similarity, and the fourth accumulated similarity.

Equation 1: sim=sim1+sim2+sim3+sim3+sim4, where weight1 is the weight of the entity similarity, weight2 is the weight of the binary relationship similarity, weight3 is the weight of the ternary relationship similarity, and weight4 is the weight of the specific entity similarity.

The weight1, weight2, weight3, and weight4 may be set according to the specific application scenario, for example, the user considers that the similarity of the specific entity and the similarity of the binary relationship are more important, and then the weight2 and weight4 may be set to higher values, for example, weight4 is 0.4, weight2 is 0.3, weight1 is 0.2, and weight3 is 0.1. In general, weight1, weight2, weight3, and weight4 can be set to 0.25.

As can be seen from equation 1, in a first possible implementation manner, the similarity between the target text and the candidate text, that is, the case where weight3 is 0 and weight4 is 0, may be determined according to the first accumulated similarity and the second accumulated similarity.

In a second possible implementation manner, the similarity between the target text and the candidate text may be determined according to the first accumulated similarity and the third accumulated similarity, that is, the weight2 is 0 and the weight4 is 0.

In a third possible implementation manner, the similarity between the target text and the candidate text may be determined according to the first accumulated similarity and the fourth accumulated similarity, that is, the weight2 is 0 and the weight3 is 0.

In a fourth possible implementation manner, the similarity between the target text and the candidate text, that is, the weight4 is 0, may be determined according to the first accumulated similarity, the second accumulated similarity, and the third accumulated similarity.

In a fifth possible implementation manner, the similarity between the target text and the candidate text, that is, the weight3 is 0, may be determined according to the first accumulated similarity, the second accumulated similarity, and the fourth accumulated similarity.

Further, in the embodiment of the present application, the similarity between the target text and each candidate text in the candidate text set may be ranked according to the size of the SIM, and the target text and each candidate text in the candidate text set may be ranked in order from big to small or from small to big, and a preset number of candidate texts may be displayed in order of similarity, for example, 3 candidate texts may be displayed in this order.

In this embodiment, the similarity between the target text and the candidate text is determined by calculating the similarity between each target entity in the target text and the candidate entity in the candidate text, and the relationship between the target entities in the target text and the candidate entity in the candidate text, and the actual expression of the content in the text can be more represented by the relationship between the entity and the entity in consideration of both the similarity of the entity and the similarity of the relationship. Further, the relationship may include a binary relationship to an N-ary relationship, for example, the relationship may include a binary relationship and a ternary relationship, the binary relationship includes two entities and a relationship between the two entities, the ternary relationship includes two binary relationships, and the two binary relationships may be connected by an association entity. In the embodiment of the application, the ternary relationship relates to three entities and the relationship between the three entities, so that the similarity between the target binary relationship and the candidate binary relationship and the similarity between the target ternary relationship and the candidate relationship can be calculated to better represent the actual expression of the content in the text. Furthermore, the similarity of the specific target entity can be determined, and the similarity of the target text and the candidate text can be determined according to the specific application scene of the user, so that the actual demand of the user is enhanced.

Optionally, determining the novelty of the target text and each candidate text according to the target similarity, wherein the novelty is inversely related to the target similarity. The higher the similarity of the target text to the candidate text, the lower the novelty of the target text relative to the candidate text. For example, the target similarity is 70%, the novelty may be 1-70% = 30%, or the novelty may be 1-k×70%, where k is a correction coefficient, and in this embodiment, the specific method for determining the novelty is not limited, and the novelty is inversely related to the target similarity.

Alternatively, on the basis of the above embodiment, the target text in this embodiment may be a target structure, and the candidate text may be a candidate structure, that is, the target text is converted into the target structure by the entity extraction model and the relationship extraction model by the method described in embodiment 1, and the candidate text is converted into the candidate structure by the entity extraction model and the relationship extraction model.

Specifically, the target text is a structured text, and in step 201, the step of obtaining the target text may further include the following steps: acquiring a target text;

Inputting the target text into an entity extraction model, and identifying an entity in the target text through the entity extraction model;

inputting the target text which has been identified to the entity into a relation extraction model, and extracting the relation between the entities through the relation extraction model;

and carrying out structural representation on the target text according to the entity and the relation between the entities to generate a structured text.

Optionally, in the step 202, the extracting the target entity set in the target text may specifically include the following steps:

and taking the target text as input of an entity extraction model, extracting a target entity set in the target text through the entity extraction model, wherein the entity extraction model is obtained by training the first corpus set, and the first corpus set is obtained by labeling entity corpus of each text in the first text set.

Optionally, the step of extracting the binary relation between every two entities in the target text may further specifically include the following steps:

inputting the target text which is identified to the target entity set into a relation extraction model, and extracting the relation between the target entities through the relation extraction model; the relation extraction model trains the second corpus information set, and the second corpus set is obtained by carrying out relation corpus labeling and entity labeling on each text of the second text set.

Example 3

Referring to fig. 7, the embodiment of the present application further provides a method for determining a text novelty, where the method is applied to an electronic device, and the electronic device may be a server or a terminal.

Step 401, determining target text.

For example, the target text may be a patent, a paper, and in this embodiment, the target text is illustrated by taking a patent as an example.

Step 402, extracting a plurality of target entities in the target text to obtain a target entity set.

In this example, a plurality of target entities in the target text are extracted by the entity extraction model in embodiment 1, specifically, the target text is input to the entity extraction model, and the plurality of target entities in the target text are identified by the entity extraction model, and the plurality of target entities form the target entity set.

Step 403, obtaining a candidate entity set of each candidate text in the candidate text set.

The candidate text set comprises a patent set, wherein the candidate text set comprises a plurality of candidate texts (such as patents), the server acquires the candidate text set from a patent database, and candidate entities of each candidate text in the candidate text set are extracted offline in advance to obtain a candidate entity set. Alternatively, the server may extract, on line, a candidate entity of each candidate text in the candidate text set to obtain the candidate entity set, and specifically, may extract, by using the entity extraction model described in embodiment 1, a candidate entity of each candidate text in the candidate text set to obtain the candidate entity set.

Step 404, determining a first entity intersection of the target entity set and the candidate entity set, where the first entity intersection is a matched entity in the target entity set and the candidate entity set.

For example, the target entity set is (brake light, base, lamp housing), and the candidate entity set is (brake light, mounting base, housing frame). The first entity intersection is (brake light).

Step 405, determining the novelty of the target text and the candidate text according to the difference parameter between the first entity intersection and the target entity set.

And determining the novelty of the first entity according to the difference parameters of the first entity set and the target entity set. Namely:

first entity novelty= [ target entity set-intersection (target entity set, candidate entity set) ]/target entity set = 1-first entity intersection/target entity set.

In this embodiment, the difference parameter between the first entity intersection and the target entity set is the ratio of the first entity intersection to the target entity set, or the difference parameter between the first entity intersection and the target entity set may be the ratio of the first entity intersection to the target entity set multiplied by a coefficient, and other variations of the difference parameter are not described herein.

In this embodiment, first, a target text requiring a novelty to be determined is determined, where the target text may be a patent; further extracting a plurality of target entities in the target text to obtain a target entity set; acquiring a candidate entity set of each candidate text in the candidate text set; traversing each candidate text, and determining a first entity intersection of the target entity set and the candidate entity set of each candidate text, wherein the first entity intersection is a matched entity in the target entity set and the candidate entity set; finally, the novelty of the target text and the candidate text is determined according to the difference parameters of the first entity intersection and the target entity set. In this embodiment, considering all target entities in a target text and all candidate entities in each candidate text, the novelty of the target text and the candidate text is determined according to the difference parameters of the first entity intersection and the target entity set, compared with the prior art, the novelty is determined only through keywords subjectively determined by a user, the method for determining the novelty needs to be influenced by subjective understanding of the user, and the method provided by the embodiment of the present application is more objective and is an expression of the real contents of the target text and the candidate text, so that the computation of the novelty is more accurate.

Optionally, on the basis of the foregoing embodiment, the embodiment of the present application may further include the following step before step 405:

extracting a plurality of binary relations in the target text to obtain a target binary relation set, wherein the binary relations comprise two entities and relations between the two entities;

acquiring a candidate binary relation set comprising a plurality of binary relations in the candidate text;

determining a first binary relation intersection of the target binary relation set and the candidate binary relation set, wherein the first binary relation intersection comprises a binary relation matched in the target binary relation set and the candidate binary relation set;

then, in step 405, determining the novelty of the target text and the candidate text according to the difference parameters of the first entity set and the target entity set may specifically include:

determining a first entity novelty according to a difference parameter of the first entity set and the target entity set; namely: first entity novelty (r1_1) = [ target entity set-intersection (target entity set, candidate entity set) ]/target entity set = 1-first entity intersection/target entity set. Determining a first binary relationship novelty according to a difference parameter of the first binary relationship intersection and the target binary relationship set;

R2_1= [ target binary relation set-intersection (target binary relation set, candidate binary relation set ]/target binary relation set=1-first binary relation intersection/target binary relation set.

The difference parameter between the first binary relation intersection and the target binary relation set may be a ratio of the first binary relation intersection to the target binary relation set, or may be the ratio multiplied by a coefficient or other variants, which is not specifically limited.

Alternatively, in another implementation, the novelty of the target text and the candidate text may be determined based on the first entity novelty and first binary relation novelty and their respective weights. In the implementation mode, the novelty of the target binary relation in the target text and the candidate binary relation in the candidate text are further calculated, and when the novelty of the target text and the candidate text is determined, the novelty between entities is considered, the novelty between the binary relations is further combined, and the accuracy of the novelty is improved.

On the basis of the above embodiment, the method may further include the following steps:

extracting a target ternary relation set in the target text, wherein the target ternary relation set comprises a plurality of ternary relations, the ternary relation comprises two binary relations, and the two binary relations have the same entity;

Acquiring a candidate ternary relation set comprising a plurality of ternary relations in the candidate text;

determining a first ternary intersection of the target ternary set and the candidate ternary set, wherein the first ternary intersection comprises a ternary matched with the target ternary set and the candidate ternary set;

wherein the determining the novelty of the target text and the candidate text according to the first entity novelty and the first binary relation novelty may further specifically include:

and determining a first ternary relation novelty according to the difference parameters of the first ternary relation intersection and the target ternary relation set. That is, r3_1= [ target ternary set-intersection (target ternary set, candidate ternary set ]/target binary set=1-first ternary intersection/target ternary set.

And determining the novelty of the target text and the candidate text according to the first entity novelty, the first binary relation novelty and the first ternary relation novelty and the weights corresponding to the first entity novelty, the first binary relation novelty and the first ternary relation novelty.

The novelty = r1_1 x weight1+r2_1 x weight2+r3_1 x weight3, wherein in this example, the weight1 is the weight of the first entity novelty; weight2 is the weight of the first binary relation novelty; weight3 is the weight of the first ternary relationship. In the implementation mode, the novelty of the target ternary relation in the target text and the candidate ternary relation in the candidate text are further calculated, and when the novelty of the target text and the candidate text is determined, the novelty between entities is considered, the novelty between binary relations and the novelty between ternary relations are further combined, and the accuracy of the novelty is improved.

In this embodiment, the relationship may also include a 4-element relationship, a 5-element relationship, and so on, and in this embodiment, only a binary relationship and a ternary relationship are described as examples, which do not limit the description of the present application.

Optionally, in this embodiment, the target text is a structured text, that is, a target structure, and each candidate text in the candidate text set is a structured candidate structure. In this example, the candidate map may be obtained from a candidate structure, and it is understood that the candidate map may include at least one candidate structure, and when the candidate map includes one candidate structure, the candidate map is identical to the candidate structure. When the candidate map includes more than or equal to 2 candidate structures, please understand with reference to fig. 8, fig. 8 is a schematic structural diagram of the candidate map, and the method for determining the candidate map may further include the following steps:

Determining associated entities of the first candidate structure and the second candidate structure; for example, the first candidate structure includes an entity: base, lamp body and lamp shade. The relationships between entities include: base-lamp shade base-lamp housing. The second candidate structure includes an entity that: lamp housing, wick and electric door. The relationships between entities include: lamp shell-lamp core lamp shell-electric door. The associated entity of the first candidate structure and the second candidate structure is "lamp shell".

And associating the first candidate structure with the second candidate structure through the association entity to obtain the candidate map. As will be appreciated in connection with fig. 8, the first candidate structure and the second candidate structure are associated by the association entity.

On the basis of the above embodiment, optionally, in this embodiment, when the target text and the candidate text are both structured texts, the novelty of the target structure and the candidate map may be calculated, in this embodiment of the present application, the number of candidate structures included in the candidate map is not limited, for example, the candidate map may include 3 candidate structures, 4 candidate structures, or all candidate structures in the candidate entity set may have associated entities, and each candidate structure may be connected by an associated entity, where in practical application, the number of candidate structures included in the candidate map is not limited, and in this embodiment, for convenience of explanation, the number of candidate texts included in the candidate map may be described by taking 2 as an example. The method in this embodiment may further comprise the steps of:

Extracting a candidate entity set of the candidate atlas; in the candidate graph, each node represents an entity and each edge represents a set of relationships. The relationship is exemplified by a binary relationship and a ternary relationship, and the binary relationship set is a relationship set of all two adjacent nodes in the candidate atlas. The ternary relation set is a relation set of all three adjacent nodes in the candidate atlas.

Determining a second entity intersection of the target entity set and the candidate entity set of the candidate map, which may be understood in conjunction with step 404 in this embodiment;

determining a second entity novelty according to a difference parameter of the second entity intersection and the target entity set; namely: second entity novelty r1_2= (target entity set-intersection [ target entity set, candidate entity set) ]/target entity set = 1-second entity intersection/target entity set. This step may be understood in conjunction with step 405 in this embodiment.

Optionally, the method may further comprise the steps of:

extracting a plurality of binary relations in the target structure to obtain a target binary relation set; for example, one target binary relationship included in the set of target binary relationships is "lamp shell-lamp core".

Positioning two target entities contained in each target binary relation in the target binary relation set to corresponding two entity positions in the candidate map; positioning the target binary relation lamp shell-lamp core into a candidate map, and finding two nodes of lamp shell and lamp core in the candidate map.

Calculating the distance between the two entity positions corresponding to each target binary relation; the distance from the lamp housing to the lamp wick in the candidate map is calculated, and it should be noted that, the distance between two adjacent nodes in the candidate map is calculated as a, and the distance between two nodes can be understood as the distance from the first entity position (such as the lamp housing) to the second entity position (such as the lamp wick), taking fig. 8 as an example, the distance from the lamp housing to the lamp wick is a, and the path from the base to the lamp wick is: the distance from the base to the lamp housing, from the lamp housing to the lamp wick, from the base to the lamp housing is a, and from the lamp housing to the lamp wick is a, namely, the distance L from the base to the lamp wick is 2a.

Determining a second binary relation novelty of each target binary relation relative to the candidate atlas according to the distance; the novelty score r2_2 of a second binary relation is proportional to L, with shorter L being less novel and longer L being more novel.

In a first implementation, the novelty of the target structure and the candidate atlas may be determined according to a second entity novelty r1_2 and a second binary relation novelty r2_2 and their respective weights. In the implementation manner, the novelty of the second entity is determined, the novelty of the target binary relation in the target text and the second binary relation in the candidate map are further calculated, and when the novelty of the target structure and the candidate structure is determined, the novelty between the entities is considered, the novelty between the binary relations is further combined, and the accuracy of the novelty is improved.

In a second implementation manner, first, a candidate binary relation set including a plurality of binary relations in the candidate map is obtained; determining a second binary relation intersection of the target binary relation set and the candidate binary relation set; determining a first binary relationship novelty according to a difference parameter of the second binary relationship intersection and the target binary relationship set;

then, the novelty of the binary relationship may be calculated from the above-described first binary relationship novelty and second binary relationship novelty and their respective weights. Namely: binary relation novelty r2=first binary relation novelty r2_1 x weight1+r2_2 x weight2, wherein weight1 is the weight of r2_1 in this example; weight2 is the weight of r2_2; the weight can be set differently according to different application scenarios.

Then, the novelty of the target structure and the candidate map is determined according to the second entity novelty R2_1 and the binary relation novelty R2_2 and the weights corresponding to the second entity novelty R2_2. In such an implementation, the binary relation novelty is jointly determined by the first binary relation novelty and the second binary relation novelty and their corresponding weights, adding to the applicable scenario in which the binary relation novelty is determined.

On the basis of the above embodiment, optionally, the method may further include the steps of:

extracting a plurality of ternary relations in the target atlas to obtain a target ternary relation set; for example, the set of target ternary relationships includes a target ternary relationship of "lamp envelope-lamp gate".

Positioning three target entities contained in each target ternary relation in the target ternary set to corresponding three entity positions in the candidate map; and respectively positioning the lamp housing, the lamp wick and the electric door to the positions of the lamp housing, the lamp wick and the electric door in the candidate map.

Calculating the shortest distance between any two of the three physical locations; and calculating the shortest distance L1 between the lamp shell and the lamp core in the candidate spectrum and the shortest distance L2 between the lamp core and the electric gate in the candidate spectrum of any two adjacent nodes. Calculating the sum of the two shortest distances, the novelty score R3-2 of a second ternary relationship is proportional to L1+L2, the shorter L1+L2 the lower the novelty, and the longer L1+L2 the higher the novelty.

In a third possible implementation manner, the novelty of the target structure and the candidate map may be determined according to the second entity novelty r1_2, the second binary relation novelty r2_2, and the second ternary relation novelty r3_2 and their respective weights.

For example, the novelty = r1_2 x weight1+r2_2 x weight2+r3_2 x weight3, in this implementation, weight1 is the weight of the second entity novelty, weight2 is the weight of the second binary novelty, and weight3 is the weight of the second ternary novelty.

Further, in the embodiment of the present application, the novelty of each candidate text in the target text and the candidate text set may be ranked according to the magnitude of the novelty, and the candidate texts may be ranked in order of from big to small or from small to big, and a preset number of candidate texts may be displayed in order of the novelty, for example, 3 candidate texts may be displayed in this order.

In the implementation mode, the novelty of the target ternary relation in the target structure and the second ternary relation in the candidate map are further calculated, and when the novelty of the target structure and the candidate structure is determined, the novelty between entities is considered, the novelty between binary relations and the novelty between ternary relations are further combined, and the accuracy of the novelty is improved.

Further, on the basis of the third implementation manner, a fourth possible implementation manner is further provided, and the method may further include the following steps:

determining a second ternary relation intersection of a target ternary relation set of the target structure and the candidate ternary relation set of the candidate atlas;

determining a first ternary relation novelty according to a difference parameter of the second ternary relation intersection and the target ternary relation set; namely: r3_1= [ target triplet-intersection (target triplet, candidate triplet) ]/target triplet.

In a fourth possible implementation manner, first, a ternary relation novelty is determined according to the first ternary relation novelty, the second ternary relation novelty and the weights corresponding to the first ternary relation novelty and the second ternary relation novelty; namely: ternary relation novelty r3=r3_1 x weight1+r3_2 x weight2, in this implementation weight1 is the weight of r3_1; weight2 is the weight of r3_2.

Then, the novelty of the target structure and the candidate map is determined according to the second entity novelty R1_2, the binary relation novelty R2 and the ternary relation novelty R3 and the weights corresponding to the two entity novelty R2, the binary relation novelty R3 and the ternary relation novelty R3. In such an implementation, the three-way relationship novelty is jointly determined by the first three-way relationship novelty and the second three-way relationship novelty and their corresponding weights, adding to the applicable scenario in which the three-way relationship novelty is determined.

In the embodiment of the present application, the content related to each other in embodiment 1, embodiment 2, and embodiment 3 may be referred to each other. For example, in the step of extracting the plurality of binary relations in the target text, the method may further include the steps of:

acquiring an entity relation data set, wherein the entity relation data set is obtained according to the entities in the text set and the relation between the entities; the entity relation matrix comprises N entities and relations among the N entities, wherein N is greater than or equal to 2;

querying in the entity relation data set to obtain M second entities having relation with the first entity, wherein M is smaller than or equal to N;

searching the second entity in a preset range in the target text;

In the step before searching the second entity within the preset range in the target text, the method may further include the steps of:

creating an entity matching window;

and determining a preset range in the target text according to the size of the entity matching window.

In the step of extracting a plurality of target entities in the target text, the method may further specifically include the following steps:

and inputting the target text into an entity extraction model, and identifying a plurality of target entities in the target text through the entity extraction model.

The step of extracting the plurality of binary relations in the target text may further specifically include the steps of:

and inputting the target text which is identified to the target entity into a relation extraction model, and extracting the binary relation between the target entities through the relation extraction model.

And carrying out structural representation on the target text according to the relation between the target entities to generate a target structure. The target structure includes nodes for representing the target entities and edges for representing relationships between the target entities.

Example 4

Referring to fig. 9, an embodiment of the present application provides a method for acquiring image information, where the method is applied to an electronic device, and the electronic device may be a server or a terminal, and the execution subject in the embodiment of the present application is not specifically limited, and the method may include the following steps:

Step 501, receiving target text information to be matched; wherein the target text information includes a target entity.

If the execution subject is a terminal, the terminal receives target text information to be matched, which is input by a user. If the execution subject is a server, the server receives target text information to be matched sent by the terminal, for example, the target text information is an engine. In one application scenario, the execution body may be described by taking a server as an example, for example, the user wants to search for image information corresponding to "engine", the terminal receives "engine" input by the user, the terminal sends the target entity to the server, and the server receives the target text information. It should be noted that, the number of target entities in the embodiments of the present application is not limited, and the target entity is "engine" in this example, which is merely an exemplary description, and does not limit the description of the present application.

Step 502, matching the target entity with the candidate entity associated with each candidate image in the image dataset.

The server matches the target entity with candidate entities associated with each candidate image in the image dataset, which may be stored internally by the server or obtained from another device, specifically but not limited to. The image dataset contains a large number of candidate images, and each candidate image has an associated candidate entity. For example, candidate image 1 is associated with a "link," candidate image 2 is associated with an "engine," and so on.

Step 503, if the target entity matches with the candidate entity associated with the first candidate image in the image dataset, determining that the first candidate image is a candidate image matching with the target entity.

For example, if a target entity (e.g., an "engine") matches a candidate entity (e.g., an "engine") associated with a first candidate image in the image dataset, the first candidate image is determined to be a candidate image that matches the target entity.

Specifically, the specific way to match the target entity with the candidate entity associated with the first candidate image in the image dataset may be:

firstly, acquiring semantic vectors of target entities and semantic vectors of candidate entities associated with candidate images; in one possible implementation manner, the semantic vector of the target entity and the semantic vector of the candidate entity may be obtained through the "candidate matrix" in step 301 in embodiment 2, and in a specific implementation manner, please understand in connection with step 301 in embodiment 2, this is not repeated here. In a second possible implementation manner, the speech vector of the target entity and the semantic vector of the candidate entity may be obtained according to step 301 in embodiment 2 through the trained Word2vec model, and the specific implementation manner is described in conjunction with step 301 in embodiment 2, and is not described herein.

And then calculating an included angle cosine value of the semantic vector of the target entity and the semantic vector of the candidate entity.

And obtaining the similarity between the target entity and the candidate entity according to the cosine value of the included angle between the semantic vector of the target entity and the semantic vector of the candidate entity, wherein the higher the similarity is, the higher the matching degree between the target entity and the candidate entity is.

And determining U candidate entities associated with the target entity according to the sequence of the matching degree from high to low, wherein U is an integer greater than or equal to 1, and determining candidate images associated with the U candidate entities as first candidate images, wherein the number of the first candidate images is not limited.

Step 504, outputting a first candidate image.

And if the execution subject is a terminal, the terminal displays the first candidate image. If the execution subject is a server, the server sends the first candidate image to a terminal so that the terminal displays the first candidate image.

In one application scenario, a user inputs an "engine", a terminal receives the "engine", the "engine" is then sent to a server, the server matches the "engine" with each candidate entity in the image dataset, finally the server matches the similarity between the target entity "engine" and the candidate entity "engine" above a threshold, the similarity between the target entity "engine" and the candidate entity "engine" is also above the threshold, and then the candidate image Aa associated with the candidate entity "engine" and the candidate image Ab associated with the candidate entity "engine" are determined to be the first candidate image. The server transmits the candidate image Aa and the candidate image Ab to a terminal, which displays the candidate image Aa and the candidate image Ab.

In the embodiment of the application, firstly, target text information to be matched is received; the target text information comprises a target entity; then matching the target entity with candidate entities associated with each candidate image in the image dataset; if the target entity is matched with the candidate entity associated with the first candidate image in the image data set, determining that the first candidate image is a candidate image matched with the target entity; the first candidate image is output. In the embodiment of the application, the output first candidate image is a candidate image matched with the target entity in the target text information, the candidate image can more vividly represent the target entity, and the method for acquiring the image information in the embodiment of the application does not need to manually consult the drawings in the text one by one like the prior art, so that the image matched with the target entity is selected, and the labor cost is greatly saved.

On the basis of the above-described embodiment, the image dataset may be established in advance, and a detailed description is given below of how the image dataset is established. In step 503, the image dataset comprises a first image dataset, and before matching the target entity with the text information associated with each candidate image in the image dataset, the method may further comprise the steps of:

In a first possible implementation, the image dataset comprises a first image dataset.

Acquiring a candidate text set; the candidate text set can be a patent text set, and comprises a plurality of candidate texts, wherein each candidate text comprises a candidate entity; if the executing body is a terminal, the terminal may obtain the candidate text set from the server, and if the executing body is a server, the candidate text set may be stored in the server, or may be obtained from another device by the server, which is not specifically limited.

Counting the occurrence frequency of each candidate entity in the candidate text set, for example, in the candidate text set, the occurrence frequency of an "engine" is 10000 times, the occurrence frequency of a "connecting rod" is 9900 times, the occurrence frequency of a "pressing mechanism" is 9800 times, and the like, and the candidate entity and the occurrence frequency thereof are only exemplified in the present example, and do not cause a limiting description of the embodiments of the present application.

Determining a high-frequency entity according to the frequency; wherein the high frequency entity comprises an entity that appears in the candidate text set with a frequency above a threshold, e.g., the high frequency entity is an entity with a frequency above 9000. Or, the high-frequency entities include entities before the preset position after sorting according to the frequency, for example, all entities appearing in the candidate text are sorted according to the order of the frequency from high to low, and the entity ranked before 10000 is selected as the high-frequency entity.

And associating each high-frequency entity with at least one corresponding candidate image to obtain a first image data set. The high frequency entities in the first image dataset are entities that occur more frequently.

Optionally, the image dataset further includes a second image dataset, and before matching the target entity with the text information associated with each candidate image in the image dataset, the method may further include the steps of:

acquiring a candidate text set; each candidate text in the candidate text set comprises a drawing description and a drawing, wherein the drawing description comprises a candidate entity and a label of the candidate entity, and the drawing comprises a candidate image and a label; each candidate text (e.g., patent) in the candidate text set includes a drawing description and a drawing, and as will be understood with reference to fig. 10, fig. 10 is a schematic diagram of the drawing description and the drawing. In fig. 10, the illustration of the drawing includes a plurality of candidate entities and numbers corresponding to each candidate entity in the drawing, for example, a "soymilk machine body" corresponds to a number "1", and a candidate image of the candidate entity corresponding to the number "1" in the drawing is a candidate image of the "soymilk machine body"; the "handpiece" corresponds to the number "2", and the candidate image of the candidate entity corresponding to the number "2" in the drawing is the candidate image of the "handpiece".

And establishing an association relation between the candidate entity and the candidate image according to the identification to obtain a second image data set. Identifying the identification (such as the number) in the drawing, matching the number in the drawing description with the number in the drawing, and then associating the candidate entity corresponding to the same number with the candidate image to obtain the second image data set.

Optionally, the image dataset further includes a third image dataset, and before matching the target entity with the text information associated with each candidate image in the image dataset, the method may further include the steps of:

acquiring a candidate text set; wherein each candidate text in the candidate text set comprises a title and a abstract drawing; the candidate text is also exemplified by patents, each of which includes a title and a abstract drawing, which is the main drawing in which this patent may be represented. For example, the patent titled "a soymilk machine".

Abstract drawings in the candidate text are extracted.

Identifying candidate entities in the title; the candidate entity in the soymilk machine is extracted by the entity extraction model.

And establishing an association relation between the candidate entity and the abstract drawing to obtain a third image data set. And establishing the association relation between the soymilk machine and the abstract drawing.

It should be noted that the image dataset may include at least one of the first image dataset, the second image dataset, and the third image dataset. In the embodiment of the present application, the image data set includes a first image data set, a second image data set, and a third image data set.

Optionally, in the step 502, the step of matching the target entity with the candidate entity associated with each candidate image in the image dataset may specifically include the following steps:

firstly, matching a target entity with candidate entities associated with candidate images in a first image data set; the candidate entities included in the first image dataset are entities with higher occurrence frequency, and the target entity and the high-frequency entity can be matched first, so that the matching rate is improved.

And if the target entity is not matched with the candidate entity in the first image data set, matching the target entity with the candidate entity associated with each candidate image in the other image data sets except the first image data set. And if the target entity is not matched with the candidate entity in the first image data set, matching the target entity with the candidate entity associated with each candidate image in the second image data set and/or the third image data set. If the target entity is matched with the candidate entity in the first image data set, the candidate image associated with the candidate entity is directly sent to the terminal, so that the terminal displays the candidate entity. In the embodiment of the application, the target entity is matched with the first image data set first, so that the matching rate is improved.

Optionally, on the basis of the above embodiment, the image dataset further includes a candidate image relationship, the candidate image relationship including at least two candidate images and a relationship between the at least two candidate images. For example, the candidate image relationship is: (candidate image 1 connects candidate image 2), such as candidate image relationship (soymilk machine body image connects head image). The candidate image relationship is obtained according to the relationship among the candidate entities, and if the relationship among the candidate entities is identified as 'soymilk machine body' connected with 'machine head' through the relationship extraction model, the relationship among the images associated with the candidate entities is determined according to the relationship among the candidate entities, so that the candidate image relationship is obtained.

Optionally, in the foregoing embodiment, when the first candidate image is included in the target candidate image relationship, for example, in the image dataset, the target candidate image relationship is (the soymilk machine body image is connected to the machine head image), the first candidate image (for example, the soymilk machine body image) is included in the target candidate image, the method may further include the steps of:

firstly, determining a second candidate image contained in a target candidate image relation, wherein the second candidate image has a relation with the first candidate image; a second candidate image (e.g., a handpiece image) included in the target candidate image relationship is determined.

The first candidate image and the second candidate image are then output.

In one application scenario, if the target entity input by the user is a "soymilk machine", and the structure of the "soymilk machine" is more vividly understood through the image information, the terminal sends the target entity to the server, the server matches the target entity (soymilk machine) with the candidate entity associated with each candidate image in the image dataset, the matched candidate entity is a "soymilk machine body", and further, the first candidate image (i.e. a soymilk machine body image) associated with the soymilk machine body has a connection relationship with the second candidate image (i.e. a machine head image), and then the first candidate image (i.e. the soymilk machine body image) and the second candidate image (i.e. the machine head image) are output. It should be noted that, in the embodiment of the present application, the number of the second candidate images is not limited, in practical application, the number of the first candidate images is not limited, for example, the number of the first candidate images is 2, each first candidate image may have a second candidate image with an association relationship, and the number of the second candidate images is not limited, for example, each first candidate image has two second candidate images with an association relationship, the number of the last output images is 4, and the output first candidate images and the second candidate images may have a topology structure, as shown in fig. 11, and fig. 11 is a schematic diagram of the topology of the first candidate images and the second candidate images. The terminal not only displays the image information of the soymilk machine, but also displays other image information related to the soymilk machine. In the embodiment, the second candidate image with the relation with the first candidate image can be output according to the relation of the candidate images, other images related with the first candidate image do not need to be manually analyzed and searched, labor cost is saved, and application scenes are increased.

On the basis of the above embodiment, optionally, the target entity includes at least a first target entity and a second target entity, and the target text information further includes a first relationship between the first target entity and the second target entity; the method can further specifically comprise the following steps:

if the first target entity is matched with the first candidate entity associated with the first candidate image in the image data set, the second target entity is matched with the second candidate entity associated with the second candidate image in the image data set; matching a first relationship between the first target entity and the second target entity with a second relationship between the first candidate entity and the second candidate entity;

if the first relationship matches the second relationship, the method further comprises:

and outputting the second candidate image.

In one application scenario, if the user inputs that the first target entity is a "soymilk machine", the second target entity is a "nose", a first relationship between the first target entity and the second target entity is a "connection", if the first target entity (soymilk machine) matches a first candidate entity (soymilk machine body) associated with a first candidate image in the image dataset, the second target entity (nose) matches a second candidate entity associated with a second candidate image (nose image) in the image dataset, then further matches a relationship, the first relationship is a "connection", and if the first relationship matches the second relationship, the second candidate image is output.

Alternatively, the establishing the relationship between the candidate images may specifically be:

extracting a relation between candidate entities in the candidate text;

and establishing a relation between candidate images associated with the candidate entities according to the relation between the candidate entities. For example, the relation between the candidate entity 'soymilk machine body' and the candidate entity 'machine head' is extracted as 'connection', and the relation between the candidate entity 'soymilk machine body' and the candidate entity 'machine head' is established as the connection relation.

Optionally, extracting the candidate entity and the relationship between the candidate entities in the candidate text may specifically include the following steps:

inputting the candidate text into an entity extraction model, and identifying candidate entities in the candidate text through the entity extraction model;

and inputting the candidate text of the identified candidate entities into a relation extraction model, and outputting the relation between the candidate entities through the relation extraction model. Specifically, the extraction of the candidate entities by the entity extraction model and the extraction of the relationships between the candidate entities by the relationship extraction model can refer to step 202 and step 203 in embodiment 1, which are not described herein.

Optionally, the target text information is a target structure of the structured representation.

Example 5

Referring to fig. 12, an embodiment of the present application further provides a method for obtaining entity information, where the method is applied to an electronic device, and the electronic device may be a server or a terminal, and the execution subject in the embodiment of the present application is not limited specifically. For a better understanding of the present embodiment, the words in the present embodiment will be described first:

the "association relationship" between entities in the present embodiment is the same as the "relationship" between entities in the above-described embodiments 1 to 4. The explanation of the association relationship in the embodiment of the present application is also applicable to the explanation of the "relationship" in the above-described embodiment 1 to embodiment 4.

The attributes of the association include relationship types including, but not limited to, conceptual relationships, affiliated relationships, positional relationships, sequential relationships, and logical relationships.

Wherein, the conceptual relationship: refers to general and specific relationships, i.e., context, such as relative to "car", where vehicles are a generic concept, relative to "bus", where "car" is a generic concept.

The concept relationship may be identified by a relationship extraction model, where the relationship extraction model is a relationship extraction model in the foregoing embodiment, and optionally, the relationship extraction model in this embodiment is further obtained by learning and training a large number of claims in a patent text, where the claims include a large number of upper and lower concepts, for example, the connection component includes a screw and a nut, the connection component is an upper concept, the screw and the nut are lower concepts, and the relationship extraction model may identify the upper and lower relationships between entities in the text by learning a large number of claims.

The relationship includes, but is not limited to, inclusion relationship, connection relationship and parallel relationship.

1) The inclusion relationship is as follows: the upper level entity is defined according to the inclusion relationship, the upper level entity comprises the lower level entity, the upper level component comprises the lower level component, such as the automobile comprises the wheels, and the upper level relationship and the lower level relationship are between the automobile and the wheels.

2) Connection relation: the entities have a connection relationship, such as a base is connected with an LED lamp, and the relationship between the base and the LED lamp is the connection relationship.

3) Parallel relation: the entities have parallel relation, for example, the soybean milk machine comprises an upper cover and a lower cover, no relation exists between the upper cover and the lower cover, no connection relation exists between the upper cover and the lower cover, and the upper cover and the lower cover are parallel, namely, the relation between the upper cover and the lower cover is parallel.

Sequential relationship: the entities have a precedence order relation. For example, step 1: receiving a first signal; step 2: and processing the signal to obtain a second signal. The first signal and the second signal have a sequence in steps, i.e. the first signal is preceding and the second signal is following, the first signal and the second signal have a sequence in time, and the "first signal" and the "second signal" are in a sequence.

Positional relationship: refers to spatial relationships such as interior, exterior, left, right, front, back, etc. For example, the "LED lamp" is disposed on the "base", and the "LED lamp" has a positional relationship with the "base".

Logical relationship: and searching at least one entity in a preset range of the entity by taking the entity as a reference position in a logic expression of the natural language, wherein the entity of the reference position and the at least one entity in the preset range are in a logic relation. For example, in a natural language logical representation: a soybean milk machine with a double-layer lower cover comprises a cup body and a machine head, wherein the machine head is arranged on the cup body and comprises an upper cover and a lower cover which is covered with the upper cover, a motor and a control circuit are fixedly arranged on the machine head, a motor shaft downwards extends into the cup body below a motor chamber, and a crushing cutter is arranged at the end part of the motor shaft. Taking a motor in the text as a reference position, g characters are forwards or backwards, for example, the g is 10, taking the motor as the reference position, 10 characters are forwards, finding out another entity of a machine head, 10 characters are backwards, finding out a control circuit and a motor shaft, and the machine head, the control circuit and the motor shaft are in logic relation with the motor.

Referring to fig. 12, a method for obtaining entity information provided in an embodiment of the present application may include the following steps:

step 601, receiving target text information; wherein the target text information comprises a first target entity.

If the execution subject is a terminal, the terminal receives target text information input by a user. If the execution subject is a server, the server receives target text information sent by the terminal. For example, the target text information is "engine". In this embodiment, the execution body may be described by taking a server as an example. In one application scenario, such as where the terminal receives a user entered "engine," the terminal sends the target entity to a server that receives the target text information. It should be noted that, the number of the first target entities in the embodiment of the present application is not limited, and the target entity is "engine" in this example, which is only an exemplary description, and does not create a limiting description of the present application.

Step 602, retrieving a first candidate entity matched with a first target entity in a data set; the data set comprises candidate entities and relations among the candidate entities, wherein the candidate entities at least comprise a first candidate entity and a second candidate entity which has an association relation with the first candidate entity.

The data set may be pre-established and then stored, or the data set may be acquired from another device. How the data set is created is described below:

acquiring a candidate text set; the candidate text set can be a patent text set, and comprises a plurality of candidate texts, wherein each candidate text comprises a candidate entity; and extracting the candidate entity in each candidate text through a relation extraction model, and extracting the relation in the candidate text through the relation extraction model to obtain the candidate entity and the relation between the candidate entity and the candidate entity. And obtaining a data set according to the candidate entities and the association relation between the candidate entities.

If the first target entity is a soymilk machine, the first candidate entity matched with the first target entity in the data set is a soymilk machine body; in the data set, a second candidate entity 'upper cover' with an association relation with the first candidate entity 'soymilk machine body'. It should be noted that, the association relationships in the embodiments of the present application include the above-mentioned belonging relationships, conceptual relationships, sequential relationships, and logical relationships.

For example, the second candidate entity may be "top-hat," i.e., the first candidate entity and the second candidate entity are in a relationship (including a relationship), and the second candidate entity is a candidate entity having a conceptual relationship, a sequential relationship, or a logical relationship with the first candidate entity, which are not illustrated herein.

It should be noted that, in this step, the specific matching method between the first target entity and the first candidate entity may be understood in conjunction with step 503 in the foregoing embodiment 4, which is not described herein.

Step 603, selecting a second candidate entity having an association relationship with the first candidate entity in the dataset.

A second candidate entity having an association with the first candidate entity is selected in the dataset, e.g., an "upper lid" is in a containment relationship with the first candidate entity, a "motor" is in a logical relationship with the first candidate entity, a "lid component" is in a conceptual relationship with the first candidate entity, etc., which are not illustrated herein.

Step 604, outputting a second candidate entity.

And the server sends the second candidate entity to the terminal, and the terminal displays the second candidate entity. In this embodiment, the number of the second candidate entities is not limited, and the association relationship between the second candidate entities and the first candidate entities is not limited.

In an application scenario, when a user needs to improve a related structure of a soymilk machine, the user can input the soymilk machine, the terminal receives the soymilk machine input by the user and sends the soymilk machine to the server, the server matches the soymilk machine with candidate entities in a data set, the soymilk machine is matched with a soymilk machine body of the candidate entities, a second candidate entity with an association relation with the soymilk machine body is determined, the server sends the second candidate entity to the terminal, the terminal displays a plurality of second candidate entities, and the plurality of second candidate entities can be displayed in a list form.

In the embodiment of the application, receiving target text information; the target text information comprises a first target entity; retrieving a first candidate entity in the dataset that matches the first target entity; the candidate entity at least comprises a first candidate entity and a second candidate entity which has an association relation with the first candidate entity; then selecting a second candidate entity with an association relation with the first candidate entity in the data set; and outputting the second candidate entity. In this embodiment, the second candidate entity having a relationship with the first target entity may be automatically recommended according to the first target entity, so that the user is prevented from analyzing the text by text through retrieval, and thus the labor cost is greatly saved.

Optionally, on the basis of the foregoing embodiment, the attribute of the association relationship includes a relationship type, and the target text information further includes a target relationship condition, where the target relationship condition is used to represent the relationship type between the target entity and the candidate entity to be acquired; the target relationship condition may be a specific literal expression, for example: including, connected, lower, etc. "include" indicates that the relationship type between the target entity and the candidate entity to be acquired is a affiliated relationship; the "connection" indicates that the relationship type between the target entity and the candidate entity to be acquired is a belonging relationship, and the "lower level" indicates that the relationship type between the target entity and the candidate entity to be acquired is a conceptual relationship. Alternatively, the target relationship condition may be represented by an identifier, for example, "bh" means including, "lj" means "connection", and so on.

In the step 603, the specific step of selecting, in the dataset, the second candidate entity having the association relationship with the first candidate entity may be:

For example, the target text information includes a first target entity "soymilk machine", the target relation condition is "containing", and then a second candidate entity conforming to the "containing" relation is selected in the dataset according to the first candidate entity "soymilk machine body", for example, the second candidate entity may be "motor", "upper cover" and "lower cover", etc.

In this embodiment, the target text information may further include a target relationship condition, and further, a second candidate entity of a type meeting the target relationship condition may be selected in the dataset according to the first candidate entity, thereby increasing the applicable scenario.

Optionally, selecting the second candidate entity having the association relationship with the first candidate entity in the data set may specifically further include:

In one implementation, determining how often each of a plurality of second candidate entities occurs in a dataset; for example, the plurality of second candidate entities are "motor", "upper cover", and "lower cover", etc. Wherein the frequency of occurrence of the "motor" in the dataset is greater than the threshold, or the frequency of occurrence of the "motor" in the dataset ranks first among all second candidate entities.

And selecting a target second candidate entity from the plurality of second candidate entities according to the frequency, and taking the target second candidate entity as the second candidate entity. For example, "motor" may be selected as the target second candidate entity.

In another implementation, a date of relevance of the candidate text to which each of the plurality of second candidate entities belongs may be determined, the date of relevance including, but not limited to, an application date, a submission date, and a publication date, the plurality of second candidate entities belonging to different texts;

and selecting a target second candidate entity from the plurality of second candidate entities according to the relevant date, and taking the target second candidate entity as the second candidate entity. The relevant date is described by taking the publication date as an example, and the target second candidate entity is selected from the plurality of second candidate entities in the order from the publication date to the current date. For example, when the publication date of the patent document to which the "motor" belongs is 2018.6.3, the publication date of the patent document to which the "upper cover" belongs is 2017.5.4, and the publication date of the patent document to which the "lower cover" belongs is 2017.1.4, a second candidate entity corresponding to the publication date closest to the current date may be selected as the target second candidate entity. It should be noted that, the plurality of second candidate entities in this embodiment are merely examples for convenience of description, and do not limit the description of the present application.

Optionally, on the basis of the foregoing embodiment, the attribute of the association relationship further includes a relationship dimension, where the relationship dimension includes a binary relationship, or the binary relationship is an X-element relationship, where X is an integer greater than or equal to 3, and the binary relationship includes two entities and a relationship between the two entities, and the X-element relationship includes X entities, at least (X-1) binary relationships, and the (X-1) binary relationships are connected by the association entity.

Optionally, based on the foregoing embodiment, the number of the second candidate entities is a plurality of, the target text information further includes a second target entity and a target relationship condition, and selecting, in the dataset, the second candidate entity having an association relationship with the first candidate entity may further specifically include:

selecting a target second candidate entity meeting target relation conditions from a plurality of second candidate entities;

outputting an R element relation group; wherein R is an integer greater than or equal to 2 and less than or equal to N, the R element relation group comprises a plurality of R element relations, each R element relation comprises a first candidate entity, a target second candidate entity and a relation between the first candidate entity and the target second candidate entity.

For example, the first target entity is an "engine", the second target entity is a "link", the target relationship condition is a "connection", the first candidate entity is an "engine" and an "engine", and the like, and the plurality of second candidate entities matched with the second target entity are retrieved in the data set, where the second candidate entities may be an "upper link", "lower link", and a "link assembly", and the like, and the R-tuple may be a binary relation group and/or a ternary relation group, and in this embodiment, the R-tuple may be illustrated by taking a binary relation group as an example, for example, the binary relation group includes: binary relation 1 (engine connecting upper link), binary relation 2 (engine connecting lower link), binary relation 3 (engine connecting link assembly), etc. In this embodiment, the R-tuple can be automatically retrieved and output according to the first target entity, the second target entity, and the relationship between the first target entity and the second target entity.

Optionally, the entity includes a component, and/or, an attribute, and/or an attribute value.

The target entity includes a target component, a target attribute, and/or a target attribute value; the candidate entity includes a candidate component, a candidate attribute, and/or a candidate attribute value, the candidate entity being associated with a candidate text to which it belongs, e.g., the candidate text is a patent text, each patent text having a patent number, the candidate entity being associated with the candidate text to which it belongs by the patent number. The method may further comprise:

Matching the target component with each candidate component, the target attribute with each candidate attribute, and/or the target attribute value with each candidate attribute value; for example, the target component is a "motor", the target property is a "voltage", and the target property value is "220V".

acquiring a first candidate text associated with a target candidate component, a second candidate text associated with a target candidate attribute, and/or a third candidate text associated with a target candidate attribute value; the number of the first candidate text, the second candidate text, and the third candidate text is not limited, for example, 100 first candidate texts including "motor" and 80 second candidate texts including "voltage" and 80 third candidate texts including "220V" are included. The 100 first candidate texts, 80 second candidate texts and 80 third candidate texts may have the same candidate texts, for example, the candidate texts XX include "motor", "voltage" and "220", that is, the first candidate text, the second candidate text and the third candidate text may be the same or different. The number of the first candidate text, the second candidate text and the third candidate text is merely an example for convenience of explanation, and the present application is not limited thereto.

Specifically, the first candidate text, the second candidate text and/or the third candidate text are output in a list form, and a user can view the candidate texts comprising 'motor', 'voltage', '220V', so as to facilitate the user to view detailed descriptions of contents in the candidate texts comprising the target component, the target attribute and/or the target attribute value in detail.

Optionally, on the basis of the foregoing embodiment, the data set includes a candidate relationship, where the candidate relationship includes at least two candidate entities and a relationship between the at least two candidate entities, the target text information includes a target relationship, the target relationship includes at least two target entities and a relationship between the target entities, and the two target entities include the first target entity and the second target entity;

the step of selecting a second candidate entity having an association relationship with the first candidate entity in the dataset may further specifically include:

retrieving a target candidate entity matched with the second target entity in the data set, wherein the target candidate entity has an association relation with the first candidate entity; for example, a target relationship includes a first target entity being a "cap", a second target entity being a "cap", and a relationship between the first target entity ("cap") and the second target entity ("cap") ("include" relationship). Target candidate entities (e.g., a "top" or "top cap" or the like, a specific number not limited) that match a second target entity (e.g., a "top cap") are retrieved in the dataset, each target candidate entity having an association (e.g., an inclusion) with a first target entity (e.g., a cap).

Searching a first candidate relation containing the target candidate entity according to the candidate relation, wherein the first candidate relation also comprises a third candidate entity and a relation between the target candidate entity and the third candidate entity; the data set comprises a plurality of candidate relations, and each candidate relation comprises at least two candidate entities and relations among the candidate entities; further, according to a plurality of candidate relationships in the dataset, a first candidate relationship including the target candidate entity (such as "top cover" or "top cover") is searched, where the target candidate entity is illustrated by taking "top cover" as an example, and the first candidate relationship includes a target candidate relationship and a third candidate entity (such as a button, a display screen, etc.), for example, the first candidate relationship may be: (upper end cap set button) or (upper end cap set display screen), etc. It should be noted that, the association relationship between the target candidate entity and the first candidate relationship in the first candidate relationship is not limited, and may be, for example, setting, connection, including, and the like.

Further, in the first implementation manner, the first candidate relationship is output as the second candidate entity, for example, output (upper end cover setting button), the server sends the first candidate relationship to the terminal, and the terminal displays the first candidate relationship according to the first candidate relationship, namely, display (upper end cover setting button). In an application scenario, if a technician inputs (the cover body includes an upper cover), the server may automatically recommend a component associated with the target relationship, that is, "buttons" may be disposed on the "upper cover", or "display screen" may be disposed on the "upper cover", which has a great reference value for the technician for technical improvement. In a second possible implementation, the third candidate entity may also be output. I.e. directly outputting the third candidate entity (i.e. button or display).

In a third possible implementation manner, a candidate entity similar to the third candidate entity may be further searched, the similarity between the two entities is determined by using the semantic vectors of the two entities as described in step 303 in embodiment 1, which is not repeated herein, and a candidate entity with similarity greater than the threshold value is selected, for example, the candidate entity similar to the third candidate entity is a "key", and the candidate entity similar to the third candidate entity is directly output as a "key".

Optionally, in a fourth possible implementation manner, the third candidate entity may be further matched with candidate entities included in each candidate relationship according to the candidate relationship, so as to determine a fourth candidate entity matched with the third candidate entity; for example, the third candidate entity is a "button", a fourth candidate entity (e.g., a key) that matches the third candidate entity (e.g., a "button").

The second candidate relationship including the fourth candidate entity is used as the second candidate entity, the second candidate relationship including the fourth candidate entity may be (the key is set on the operation panel), the second candidate relationship is output, and contents that may be displayed at the terminal are: the cover body comprises an upper cover, the upper cover is provided with keys, the keys are arranged on the operation panel, and optionally, the displayed content can be a structured text or a structured image. In an application scenario, if a technician inputs (the cover body includes an upper cover), the server may automatically recommend a component associated with the target relationship, that is, a "button" may be disposed on the "upper cover", or a "key" may be disposed on the "upper cover", where the "key" is disposed on the "operation panel", and the recommendation of the entity by the server has a great reference value for the technician to the technical improvement.

Optionally, in a fifth possible implementation manner, the target text information includes a target relationship, where the target relationship includes at least two target entities and a relationship between the target entities, and the two target entities include the first target entity and the second target entity; the selecting a second candidate entity having an association relationship with the first candidate entity in the dataset may further specifically include:

Optionally, in a fifth possible implementation manner, searching a fifth candidate entity having an association relationship with the target candidate entity according to a candidate relationship, where the fifth candidate entity is included in a third candidate relationship, and the third candidate relationship includes the fifth candidate entity, a sixth candidate entity, and a relationship between the fifth candidate entity and the sixth candidate entity; if a fifth candidate entity (soymilk machine body) having an association relationship with the target candidate entity (upper end cover) is found according to the candidate relationship, the fifth candidate entity is included in a third candidate relationship, and the third candidate relationship may be (upper end cover is connected with soymilk machine body), or the third candidate relationship may be (soymilk machine body includes lower end cover), and the sixth candidate entity may be the same as or different from the target candidate entity.

Further, the third candidate relation is output as the second candidate entity. In an application scenario, if a technician inputs (the cover body includes an upper cover), the server may automatically recommend a candidate relationship associated with the target relationship, for example, the terminal may display the following contents: the cover body comprises an upper cover, the upper cover is connected with the soymilk machine body, the soymilk machine body comprises a lower cover or the cover body comprises an upper cover, the soymilk machine body is connected with the base, and the upper cover is connected with the soymilk machine body. In the example, according to the target relationship, the server can recommend the relationship with the association relationship with the target relationship, the applicable scene is enhanced, and the recommendation of the relationship by the server has great reference value for technical improvement.

Optionally, in a sixth possible implementation manner, determining a fourth candidate relationship including the third candidate relationship according to the candidate relationship; for example, the fourth candidate relationship is: (the upper end cover is connected with the soymilk machine body, the soymilk machine body is connected with the base), and further, the fourth candidate relationship is output as the second candidate entity. In an application scenario, if a technician inputs (the cover body includes an upper cover), the server may automatically recommend a candidate relationship associated with the target relationship, for example, the terminal may display the following contents: the cover body comprises an upper cover, the upper end cover is connected with the soymilk machine body, and the soymilk machine body comprises a lower end cover. In the example, according to the target relationship, the server can recommend the relationship with the association relationship with the target relationship, the applicable scene is enhanced, and the recommendation of the relationship by the server has great reference value for technical improvement.

In this embodiment, the candidate relationships and the target relationships are all exemplary descriptions, and do not limit the description of the present application.

Optionally, on the basis of the foregoing embodiment, the data set further includes an image data set, the image data set includes a plurality of candidate images, each candidate image of the plurality of candidate images has an associated candidate entity, and after selecting the second candidate entity in the data set having a relationship with the first candidate entity, the method further includes:

For example, in one application scenario, the second candidate entities are "upper link" and "lower link", the candidate images associated with "upper link" and "lower link" are determined from the second candidate entity lookup image dataset, and the images of "upper link" and "lower link" are output as the second candidate entities.

In this embodiment, the candidate image of the second candidate entity may be obtained, and the candidate image of the second candidate entity may be directly output, so that the mobility of the second candidate entity is enhanced, and the image information is easier for the user to understand the second candidate entity.

Optionally, on the basis of the above embodiment, a description is given below of how to create an image dataset:

in one implementation, the image dataset comprises a first image dataset, and prior to looking up the image dataset from the second candidate entity to determine the candidate image associated with the second candidate entity, the method further comprises:

determining a high-frequency entity according to the frequency, wherein the high-frequency entity is as follows: the entity whose frequency of occurrence is higher than the threshold, or the high frequency entity is: after sorting according to the frequency, entities before the preset position;

and associating each high-frequency entity with at least one corresponding candidate image to obtain a first image data set.

In a second implementation, the image dataset comprises a second image dataset, and the method further comprises, prior to locating the image dataset from the second candidate entity and determining the candidate image associated with the second candidate entity:

acquiring a candidate text set, wherein each candidate text in the candidate text set comprises a drawing description and a drawing, the drawing description comprises a candidate entity and a mark of the candidate entity, and the drawing comprises a candidate image and a mark;

And establishing an association relation between the candidate entity and the candidate image according to the identification to obtain a second image data set.

In a third implementation, the image dataset comprises a third image dataset, and the method further comprises, prior to looking up the image dataset from the second candidate entity to determine the candidate image associated with the second candidate entity:

extracting abstract drawings in the candidate text;

identifying candidate entities in the title;

and establishing an association relation between the candidate entity and the abstract drawing to obtain a third image data set.

In this embodiment, the image data set includes a first image data set, a second image data set, and/or a third image data set, and the specific method for creating the first image data set, the second image data set, and the third image data set can be understood by referring to the specific method for creating the image data in embodiment 4.

Optionally, how to find the image dataset is explained as follows:

the image data set comprises a first image data set, wherein the first image data set comprises candidate images of high-frequency entities, and the high-frequency entities are candidate entities with the use frequency higher than a threshold;

Searching the first image data set according to the second candidate entity;

if no candidate image associated with the second candidate entity is found in the first image dataset, searching for other image datasets (e.g., the second image dataset and/or the third image dataset) other than the first image dataset based on the second candidate entity.

Firstly, matching a target entity with candidate entities associated with candidate images in a first image data set; because the candidate entities included in the first image dataset are entities with higher occurrence frequency, the target entity can be matched with the high-frequency entity first so as to improve the matching rate.

Example 6

An embodiment of an apparatus for obtaining entity information is provided, where the apparatus is configured to perform the method steps actually performed by the electronic device in the foregoing embodiment 5, and the apparatus 1300 includes:

a receiving module 1301, configured to receive target text information, where the target text information includes a first target entity;

a matching module 1302, configured to retrieve a first candidate entity matching the first target entity received by the receiving module 1301 in a dataset, where the dataset includes candidate entities and relationships between the candidate entities, the candidate entities at least include a first candidate entity and a second candidate entity, and the first candidate entity has a relationship with the second candidate entity;

a selection module 1303 for selecting, in the dataset, a second candidate entity having a relationship with the first candidate entity determined by the matching module 1302;

and an output module 1304, configured to output the second candidate entity selected by the selection module 1303.

Optionally, the attribute of the association relationship includes a relationship type, and the target text information further includes a target relationship condition, where the target relationship condition is used to represent the relationship type between the target entity and the candidate entity to be acquired; the selecting module 1303 is further configured to select, according to the first candidate entity, a second candidate entity of a type that meets the target relationship condition in the dataset.

Optionally, the relationship type includes at least one of a conceptual relationship, a belonging relationship, a positional relationship, a sequential relationship, and a logical relationship.

Optionally, the selecting module 1303 is further specifically configured to: selecting a plurality of second candidate entities of the second candidate entities with association relation with the first candidate entity in the data set according to the first candidate entity;

Optionally, the selecting module 1303 is further specifically configured to: determining how frequently each of the plurality of second candidate entities occurs in the dataset;

Optionally, the selecting module 1303 is further specifically configured to: determining the relevant date of the candidate text of each second candidate entity in the plurality of second candidate entities;

Optionally, the attribute of the association relationship further includes a relationship dimension, where the relationship dimension includes a binary relationship, or a binary relationship is an X-element relationship, where X is an integer greater than or equal to 3, the binary relationship includes two entities and a relationship between the two entities, the X-element relationship includes X entities, at least (X-1) binary relationships, and the (X-1) binary relationships are connected by the association entity.

Optionally, the number of the second candidate entities is multiple, and the target text information further includes a second target entity and a target relationship condition; the selecting module 1303 is further specifically configured to:

Optionally, the entity includes a component, and/or an attribute value.

Referring to fig. 14, on the basis of the embodiment corresponding to fig. 13, an embodiment of an apparatus 1400 for obtaining entity information is further provided, including:

the target entity comprises a target component, a target attribute, and/or a target attribute value; the candidate entity comprises a candidate component, a candidate attribute and/or a candidate attribute value, and the candidate entity is associated with a candidate text to which the candidate entity belongs;

the apparatus further comprises: an acquisition module 1306;

the matching module 1302 is further specifically configured to:

an obtaining module 1306, configured to obtain a first candidate text associated with the target candidate component determined by the matching module 1302, a second candidate text associated with the target candidate attribute, and/or a third candidate text associated with the target candidate attribute value;

The output module 1304 is further configured to output the first candidate text, the second candidate text, and/or the third candidate text acquired by the acquiring module 1306.

Optionally, the data set further comprises an image data set comprising a plurality of candidate images, each candidate image of the plurality of candidate images having an associated candidate entity; the apparatus further comprises a lookup module 1035;

a searching module 1035, configured to search the image dataset according to the second candidate entity selected by the selecting module 1303, determine a candidate image associated with the second candidate entity, and take the candidate image of the second candidate entity as the second candidate entity.

Optionally, the image dataset includes a first image dataset, the first image dataset containing candidate images of high-frequency entities, the high-frequency entities being candidate entities with frequency of use above a threshold;

the searching module 1035 is further specifically configured to:

searching the first image data set according to the second candidate entity selected by the selection module 1303;

Optionally, the image dataset comprises a first image dataset, the apparatus further comprising: a data set creation module 1307;

a data set creation module 1307 for creating the image data set.

Optionally, the data set establishing module 1307 is further specifically configured to:

Optionally, the image data set includes a second image data set, and the data set creating module 1307 is further specifically configured to:

Optionally, the image dataset comprises a third image dataset;

the data set establishing module 1307 is further specifically configured to:

extracting the abstract drawing in the candidate text;

identifying candidate entities in the title;

Optionally, the target text information is a target structure of the structural representation.

Referring to fig. 15, an embodiment of the present application further provides an electronic device 70, where the electronic device 70 includes: memory 710, transceiver 720, and processor 730. Those skilled in the art will appreciate that an electronic device may also include other components, such as various components common in computers. Memory 710, transceiver 720, and processor 730 are in communication with each other, and memory 710 is configured to store computer instructions that, when executed by processor 730, cause electronic device 70 to perform the methods described in the method embodiments above.

The present application also provides a computer storage medium for storing computer software instructions for use, including instructions for performing the methods performed in the method embodiments.

It will be appreciated by those skilled in the art that implementing all or part of the above-described embodiment method may be implemented by a computer program to instruct related hardware, where the program may be stored in a computer readable storage medium, and the program may include the above-described embodiment method when executed. Wherein the storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a Flash Memory (Flash Memory), a Hard Disk (HDD), or a Solid State Drive (SSD); the storage medium may also comprise a combination of memories of the kind described above.

Although embodiments of the present invention have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations are within the scope of the invention as defined by the appended claims.

Claims

1. A method of obtaining information about an entity, wherein the entity is a word representing a technical feature, the entity comprising a component, an attribute or an attribute value, the method comprising:

outputting the second candidate entity;

the attribute of the association relationship further comprises a relationship dimension, wherein the relationship dimension comprises a ternary relationship to an X-element relationship, X is an integer greater than or equal to 3, the X-element relationship comprises X entities and at least (X-1) binary relationships, the (X-1) binary relationships are connected through association entities, and the binary relationships comprise two entities and a relationship between the two entities;

the data set comprises candidate relations, the candidate relations comprise at least two candidate entities and relations between the at least two candidate entities, the target text information comprises target relations, the target relations comprise at least two target entities and relations between the target entities, and the two target entities comprise the first target entity and the second target entity; the selecting a second candidate entity with an association relation with the first candidate entity in the dataset comprises: retrieving a target candidate entity matched with the second target entity in the data set, wherein the target candidate entity has an association relation with the first candidate entity; searching a first candidate relation containing the target candidate entity according to the candidate relation, wherein the first candidate relation also comprises a third candidate entity and a relation between the target candidate entity and the third candidate entity; and taking the first candidate relation as the second candidate entity.

2. The method according to claim 1, wherein the attribute of the association relationship includes a relationship type, and the target text information further includes a target relationship condition for representing the relationship type between the target entity and a candidate entity to be acquired;

3. The method of claim 2, wherein the relationship type comprises at least one of a conceptual relationship, a belonging relationship, a positional relationship, a sequential relationship, and a logical relationship.

4. The method of claim 1, wherein the selecting a second candidate entity in the dataset that has an association with the first candidate entity comprises:

5. The method of claim 4, wherein selecting a target second candidate entity from the plurality of second candidate entities according to a preset rule, the target second candidate entity being the second candidate entity, comprises:

6. The method of claim 4, wherein selecting a target second candidate entity from the plurality of second candidate entities according to a preset rule, the target second candidate entity being the second candidate entity, comprises:

7. The method of claim 1, wherein the number of second candidate entities is a plurality, the target text information further comprises a second target entity and a target relationship condition, the selecting a second candidate entity in the dataset that has an association with the first candidate entity comprises:

Retrieving a plurality of second candidate entities in the dataset that match the second target entity, the plurality of second candidate entities having an association with the first candidate entity;

outputting an R element relation group; wherein R is an integer greater than or equal to 2 and less than or equal to X, the set of R-gram relationships includes at least one R-gram relationship, each of the R-gram relationships includes the first candidate entity, the target second candidate entity, and a relationship between the first candidate entity and the target second candidate entity.

8. The method of claim 7, wherein the method further comprises:

and outputting the third candidate entity.

9. The method of claim 8, wherein the method further comprises:

according to the candidate relations, matching the third candidate entity with candidate entities contained in each candidate relation, and determining a fourth candidate entity matched with the third candidate entity;

and taking a second candidate relation containing the fourth candidate entity as the second candidate entity.

10. The method of claim 1, wherein the target text information comprises a target relationship comprising at least two target entities and a relationship between the target entities, the two target entities comprising the first target entity and the second target entity;

retrieving a target candidate entity matched with the second target entity in the data set, wherein the target candidate entity has an association relation with the first candidate entity;

the method further comprises the steps of:

searching a fifth candidate entity with an association relation with the target candidate entity according to a candidate relation, wherein the fifth candidate entity is contained in a third candidate relation, and the third candidate relation contains the fifth candidate entity, a sixth candidate entity and a relation between the fifth candidate entity and the sixth candidate entity;

and taking the third candidate relation as the second candidate entity.

11. The method according to claim 10, wherein the method further comprises:

determining a fourth candidate relation containing the third candidate relation according to the candidate relation;

And taking the fourth candidate relation as the second candidate entity.

12. The method according to any of claims 1-11, wherein the target entity comprises a target component, a target attribute, and/or a target attribute value; the candidate entity includes a candidate component, a candidate attribute, and/or a candidate attribute value, the candidate entity being associated with a candidate text to which it belongs, the method further comprising:

13. An apparatus for obtaining information about an entity, wherein the entity is a word representing a technical feature, the entity comprising a component, an attribute, or an attribute value, the apparatus comprising:

The receiving module is used for receiving target text information, wherein the target text information comprises a first target entity;

the matching module is used for searching a first candidate entity matched with the first target entity received by the receiving module in a data set, wherein the data set comprises candidate entities and relations among the candidate entities, the candidate entities at least comprise a first candidate entity and a second candidate entity, and the first candidate entity and the second candidate entity have an association relation;

the output module is used for outputting the second candidate entity;

14. An electronic device, comprising:

a memory and a processor;

the memory and the processor are communicatively coupled to each other, the memory having stored therein computer instructions that, when executed, cause the processor to perform the method of any of claims 1-12.

15. A computer storage medium, characterized in that the computer readable storage medium stores computer instructions for causing the computer to perform the method of any one of claims 1-12.