WO2023130688A1 - 一种自然语言处理方法、装置、设备及可读存储介质 - Google Patents

一种自然语言处理方法、装置、设备及可读存储介质 Download PDF

Info

Publication number
WO2023130688A1
WO2023130688A1 PCT/CN2022/102862 CN2022102862W WO2023130688A1 WO 2023130688 A1 WO2023130688 A1 WO 2023130688A1 CN 2022102862 W CN2022102862 W CN 2022102862W WO 2023130688 A1 WO2023130688 A1 WO 2023130688A1
Authority
WO
WIPO (PCT)
Prior art keywords
entity
relationship
sentence
entities
target
Prior art date
Application number
PCT/CN2022/102862
Other languages
English (en)
French (fr)
Inventor
王立
赵雅倩
范宝余
李仁刚
Original Assignee
苏州浪潮智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州浪潮智能科技有限公司 filed Critical 苏州浪潮智能科技有限公司
Publication of WO2023130688A1 publication Critical patent/WO2023130688A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Definitions

  • the present application relates to the field of computer technology, in particular to a natural language processing method, device, equipment and readable storage medium.
  • relevant information can be expanded for the input data of the BERT (Bidirectional Encoder Representations from Transformer) model, but the expanded information will affect the model's discrimination of the input data, resulting in a decrease in the accuracy of the processing results.
  • BERT Bidirectional Encoder Representations from Transformer
  • various strategies are used to improve the selection accuracy of the extended information, but this still cannot effectively reduce the negative impact of the extended information on the original input data.
  • the embodiment of the present application provides a natural language processing method, including:
  • the extended information of any entity in the target sentence and the attention score of other entities in the target sentence are adjusted to zero.
  • determining extended information for the entity includes:
  • An entity whose relationship probability value is greater than a first threshold is selected from the entity group, and extended information of the target object is generated based on the selected entity.
  • determining an entity group that has a relationship with the target object in the preset entity set includes:
  • N is the number of entities included in the preset entity set
  • M is the number of relationships between different entities in the preset entity set number
  • an N ⁇ N ⁇ M dimensional tensor used to represent the relationship between entities in the preset entity set and the relationship probability value is generated, including:
  • Two adjacent entities in the sentence to be recognized are used as entity groups to obtain multiple entity groups;
  • relationship recognition model to identify the relationship between two entities in each entity group to obtain a plurality of M-dimensional relationship vectors
  • the element corresponding to the maximum value in the initial tensor is updated from 0 to 1 to update the initial tensor;
  • a relationship between two entities in each entity group is identified using a relationship recognition model to obtain a plurality of M-dimensional relationship vectors, including:
  • optimizing the currently obtained tensor to obtain an N ⁇ N ⁇ M dimensional tensor includes:
  • the initial three-dimensional matrix and the new three-dimensional matrix are compared bit by bit, and the maximum value of each position is retained to obtain an N ⁇ N ⁇ M dimensional tensor.
  • the relationship recognition model includes a sub-model of the converter structure and a relationship classification neural network; the new sentence obtained by replacing is input into the relationship recognition model, so that the relationship recognition model outputs the M-dimensional relationship vector corresponding to the two entities, include:
  • determining extended information for the entity includes:
  • the target entity is an entity other than the entity in the preset entity set
  • the extended information of the target object is generated based on the object entity having the greatest correlation with the target object.
  • determining the object entity having the greatest association with the target object in the preset entity set includes:
  • N-1 is the number of object entities, and N is the number of entities included in the preset entity set;
  • For each object entity calculate the product of the correlation corresponding to the object entity and the maximum relationship probability value corresponding to the object entity, obtain the association score corresponding to the object entity, and obtain N-1 association scores;
  • the object entity corresponding to the maximum association score among the N-1 association scores is taken as the object entity having the maximum association with the target object.
  • determining the relevance of each object entity to the target sentence includes:
  • the sum of the degree of correlation between each entity in the target sentence and the object entity is determined as the correlation between the object entity and the target sentence.
  • the degree of correlation between any entity in the target sentence and any object entity is: the maximum relationship probability value between any entity in the target sentence and the object entity plus the maximum relationship between the object entity and the entity probability value.
  • the natural language processing method also includes:
  • the extended information of any entity in the target sentence and the attention score of the entity are adjusted to the target value;
  • the target value is the sum of the extended information and the attention score of the entity and lnY The sum;
  • Y is the value of the corresponding position in the N ⁇ N ⁇ M dimensional tensor;
  • the N ⁇ N ⁇ M dimensional tensor is used to represent the relationship and the relationship probability value between entities in the preset entity set.
  • the extended information of any entity in the target sentence and the attention scores of other entities include: attention scores between each word in the extended information of the entity and each word in other entities.
  • the embodiment of the present application provides a natural language processing device, including:
  • An acquisition module configured to acquire the target statement to be processed, and determine each entity in the target statement
  • An expansion module for each entity in the target sentence, in response to the existence of the entity in the preset entity set, determine the expansion information for the entity, and add the determined expansion information to the position of the entity in the target sentence Afterwards, the updated target sentence is obtained;
  • the processing module is used to input the updated target sentence into the BERT model, so that the BERT model performs natural language processing tasks; wherein, during the process of performing natural language processing tasks by the BERT model, the extended information of any entity in the target sentence Attention scores with other entities in the target sentence are adjusted to zero.
  • an electronic device including:
  • a processor configured to execute computer-readable instructions to implement the natural language processing method in any of the foregoing embodiments.
  • the embodiment of the present application provides one or more non-volatile computer-readable storage media storing computer-readable instructions, which is characterized in that, when the computer-readable instructions are executed by one or more processors, Make one or more processors execute the steps of the natural language processing method in any of the foregoing embodiments.
  • Fig. 1 is a flow chart of a natural language processing method according to one or more embodiments
  • Fig. 2 is a schematic diagram of a relationship recognition model according to one or more embodiments
  • Fig. 3 is a schematic diagram of an entity recognition model according to one or more embodiments.
  • Fig. 4 is a schematic diagram of a natural language processing device according to one or more embodiments.
  • Fig. 5 is a schematic diagram of an electronic device according to one or more embodiments.
  • Knowledge graph is the best source of knowledge extraction because of the structured information stored.
  • Knowledge graph is a knowledge system formed by structuring human knowledge, which contains basic facts, general rules and other relevant structured information, and can be used for intelligent tasks such as information retrieval, reasoning and decision-making.
  • the rich information contained in the knowledge map can assist the BERT model to better perform natural language processing tasks.
  • BERT can be assisted in natural language processing tasks.
  • the expanded information will affect the model's discrimination of the input data, resulting in a decrease in the accuracy of the processing results.
  • various strategies are used to improve the selection accuracy of the extended information, but this still cannot effectively reduce the negative impact of the extended information on the original input data. For this reason, the present application provides a natural language processing solution, which can effectively reduce the negative impact of the extended information on the original input data.
  • step S101 the embodiment of the present application discloses a natural language processing method, including step S101, step S102 and step S103.
  • Step S101 Obtain the target sentence to be processed, and determine each entity in the target sentence.
  • nouns in the target sentence are referred to as entities, and the target sentence includes at least one entity.
  • the statement "Company A managed by Xiao Ming operates a clothing business" includes 3 entities: Xiao Ming, Company A, and clothing business.
  • Step S102 for each entity in the target sentence, if the entity exists in the preset entity set, determine extended information for the entity, and add the determined extended information to the position of the entity in the target sentence, Get the updated target sentence.
  • Step S102 may include: for each entity in the target sentence, in response to the existence of the entity in the preset entity set, determine extended information for the entity, and add the determined extended information to the position of the entity in the target sentence After that, the updated target sentence is obtained.
  • the preset entity set is also a collection of many entities. If the entity in the target sentence exists in the preset entity set, it means that other entities related to the entity can be found from the preset entity set, then the extended information can be determined for the entity accordingly, and the determined extended information can be added to After the position of the entity in the target sentence, the extended information is added for the entity. Accordingly, extended information can be added to each entity in the target sentence that exists in the preset entity set, so that an updated target sentence can be obtained. Wherein, entities in the target sentence that do not exist in the preset entity set do not add extended information.
  • the target sentence is: "Xiaoming founded Company A”
  • the extended information added to the entity “Xiaoming” is: “is Chinese”, where “Chinese” is an entity in the default entity set; it is "Company A”
  • the extended information added by this entity is: "operating clothing business”, in which "clothing business” is an entity in the preset entity set; then the final updated target statement is: "Xiao Ming is a Chinese who founded company A to manage clothing business”.
  • “is” is: a relational vocabulary connecting "Xiao Ming” and “Chinese people”
  • "operating” is: a relational vocabulary connecting "A company” and "clothing business”.
  • Such relational vocabulary can be determined based on language idioms.
  • Step S103 input the updated target sentence into the BERT model, so that the BERT model performs natural language processing tasks; wherein, in the process of performing natural language processing tasks by the BERT model, the extended information of any entity in the target sentence is combined with the target The attention scores of other entities in the sentence are adjusted to zero.
  • the BERT model includes a self-attention layer, which calculates the attention scores between different words in the sentence.
  • the extended information of an entity is combined with the attention scores of other entities in the target sentence. Adjusting the power score to zero can prevent the attention score between irrelevant information from providing wrong information for the subsequent processing, thus reducing the negative impact of the extended information on the original input data.
  • the updated target sentence is used as the input data of the BERT model, so that the BERT model can obtain as much information as possible when performing natural language processing tasks, thereby improving the processing accuracy and effect.
  • the BERT model handles the question answering task
  • it can add extended information to each entity in the question sentence according to this embodiment to update the question sentence.
  • the new question sentence is updated, the new question sentence is processed to determine the most optimal answer for the new question sentence.
  • this embodiment can add extended information for each entity in the target sentence, so that effective information can be expanded for the target sentence as the input data of the BERT model.
  • the extended information of any entity in the target sentence and the attention score of other entities in the target sentence are adjusted to zero, so that: the attention score of the added information and the irrelevant information in the original sentence is zero, which can shield the interaction between irrelevant information , so as to effectively reduce the negative impact of the expanded information on the original input data, so that the BERT model can improve the processing accuracy of natural language processing tasks, and improve the processing efficiency and processing effect of the BERT model.
  • the original target sentence is "Xiao Ming founded Company A”
  • the updated target sentence is: "Xiao Ming is a Chinese who founded Company A to operate clothing business”, in which "is Chinese” is the extended information of "Xiao Ming”, which is the same as
  • the original sentence "founding company A” has nothing to do with it, so the attention score of each word in “is Chinese” and each word in "founding company A” is adjusted to zero, which can reduce the impact of "is Chinese” on the original sentence.
  • the negative impact of "founding company A" in the sentence is adjusted to zero, which can reduce the impact of "is Chinese” on the original sentence.
  • "operating clothing business” is the extended information of "Company A", which has nothing to do with “Xiao Ming" in the original sentence, so the attention of each word in "operating clothing business” and each word in "Xiao Ming" The score is adjusted to zero, which can reduce the negative impact of "running the clothing business” on "Xiao Ming" in the original sentence.
  • determining the extended information for the entity includes: taking the entity as the target object, and determining an entity group that has a relationship with the target object in the preset entity set; selecting a relationship probability value greater than the first in the entity group a threshold of entities, and generate extended information of the target object based on the selected entities.
  • determining the entity group that has a relationship with the target object in the preset entity set includes: generating an N ⁇ N ⁇ M dimensional tensor used to represent the relationship between entities in the preset entity set and the relationship probability value ; N is the number of entities included in the preset entity set, M is the number of relationships between different entities in the preset entity set; generate a knowledge map based on N ⁇ N ⁇ M dimensional tensors, and query the existence of the target object in the knowledge map The entity group for the relationship.
  • the N ⁇ N ⁇ M dimensional tensor is equivalent to the knowledge map, so generating the knowledge map based on the N ⁇ N ⁇ M dimensional tensor is only to express the relevant information in the tensor in the form of the knowledge map.
  • the entity group that has a relationship with the target object can be queried in the knowledge graph.
  • generating an N ⁇ N ⁇ M dimensional tensor used to represent the relationship between entities in the preset entity set and the relationship probability value includes: generating an initial tensor composed of N ⁇ N ⁇ M dimensions all 0s Quantity; obtain the sentence library used to generate the preset entity set, and traverse each sentence in the sentence library, and use the traversed sentence as the sentence to be recognized; take the two adjacent entities in the sentence to be recognized as an entity group, and get Multiple entity groups; use the relationship recognition model to identify the relationship between two entities in each entity group, and obtain multiple M-dimensional relationship vectors; for each M-dimensional relationship vector, if the maximum value in any M-dimensional relationship vector If it is greater than the second threshold, update the element corresponding to the maximum value in the initial tensor from 0 to 1 to update the initial tensor; traverse the next sentence in the statement library, and continue to update the current tensor until each sentence in the statement library sentences have been traversed, the elements in some positions in the initial tensor
  • the element corresponding to the maximum value in the initial tensor is updated from 0 to 1 to update the initial tensor, including : For each M-dimensional relationship vector, in response to the maximum value in any M-dimensional relationship vector being greater than the second threshold, update the element corresponding to the maximum value in the initial tensor from 0 to 1 to update the initial tensor.
  • the relationship between two entities in each entity group is identified by using a relationship recognition model to obtain a plurality of M-dimensional relationship vectors, including: for two entities in any entity group, the to-be-identified The two entities in the sentence are replaced with different identifiers, and the new sentence obtained by the replacement is input into the relation recognition model, so that the relation recognition model outputs the M-dimensional relation vectors corresponding to the two entities.
  • the relation recognition model can refer to Fig. 2, includes a sub-model of Transformer (converter) structure and a relation classification neural network;
  • the sub-model of Transformer structure includes 12 multi-head attention modules.
  • the relationship recognition model can also be of other structures.
  • Any new sentence is first input into the sub-model of the Transformer structure to obtain two feature vectors: m ⁇ and m ⁇ , and then m ⁇ and m ⁇ are input into the relationship classification neural network to obtain M relationships between ⁇ and ⁇ and M relationship probability values , the M relationships and M relationship probability values are represented by M-dimensional relationship vectors, and each element in the vector is a relationship probability value.
  • ⁇ and ⁇ are: replace the different identifiers of two adjacent entities, so the M relationships and M relationship probability values between ⁇ and ⁇ are: the M relationships between the two adjacent entities that are replaced relationship and M relationship probability values.
  • the M-dimensional relationship vector obtained for the new sentence 1 represents the relationship between "Xiao Ming" and "Company A”. If it is determined based on the sentence "Company A has Xiaoming”, the clause: ⁇ has ⁇ . Then the M-dimensional relationship vector obtained for the clause " ⁇ has ⁇ " indicates the relationship between "Company A” and "Xiaoming”. It can be seen that after the positions of two adjacent entities are exchanged, it is necessary to use the relationship recognition model to perform relationship recognition again.
  • the relationship between two entities is 0 in the tensor
  • the present application also optimizes the currently obtained tensor composed of 0 and 1.
  • X M AO M A T .
  • the optimal A' and M optimal O i ' can be solved based on the gradient descent method, that is, the optimal A ' and the optimal O i ' in the above M equations are calculated respectively.
  • calculate X' 1 A'O 1 A' T
  • X' 2 A'O 2 A' T
  • X' M A'O M A' T
  • M calculation results: X' 1 ... X' i ... X' M .
  • a new three-dimensional matrix can be obtained by splicing M calculation results.
  • A' T is the transpose matrix of A'.
  • each element in the initial three-dimensional matrix and the new three-dimensional matrix is compared bit by bit, and the maximum value of each position is retained, so as to obtain an N ⁇ N ⁇ M dimensional tensor.
  • the element at a certain position in the initial three-dimensional matrix is 0, and the element in the new three-dimensional matrix is 0.02, then the element at this position of X" is recorded as 0.02.
  • the element at a certain position in the initial three-dimensional matrix is 1, and the element in the new three-dimensional matrix is 0.99, then the element at this position of X" is recorded as 1.
  • determining the extended information for the entity includes: taking the entity as the target object, and determining the object entity that has the greatest association with the target object in the preset entity set; Entities other than the entity; the extended information of the target object is generated based on the object entity that has the greatest correlation with the target object.
  • determining the object entity having the greatest association with the target object in the preset entity set includes: determining the maximum relationship probability value between the target object and each object entity, and obtaining N-1 maximum relationship probability values; N-1 is the number of object entities, N is the number of entities included in the preset entity set; determine the correlation between each object entity and the target sentence, and obtain N-1 correlations; for each object entity, calculate the The product of the correlation corresponding to the object entity and the maximum relationship probability value corresponding to the object entity is obtained to obtain the correlation score corresponding to the object entity, and N-1 correlation scores are obtained; the maximum correlation score in the N-1 correlation scores corresponds to Object entity, as the object entity that has the greatest association with the target object.
  • the preset entity set includes N entities, and N is a positive integer, then if any entity W in the target sentence exists in the preset entity set, the entity W and the remaining N-1 entities in the preset entity set A maximum relationship probability value can be determined, so as to obtain N-1 maximum relationship probability values.
  • the relationship between different entities is represented by an M-dimensional vector
  • there are M relationship probability values in the M-dimensional vector and in this embodiment, the maximum value is determined as any entity W in the target sentence and the rest of the preset entity set The maximum relationship probability value between other entities of .
  • all the N-1 entities in the default entity set except the entity W can obtain a correlation with the target sentence, so as to obtain N-1 correlations.
  • N-1 association scores are available.
  • the entity corresponding to the maximum correlation score among the N-1 correlation scores is: the entity with the greatest relationship with the entity W and the maximum correlation with the target sentence, so it is taken as the entity with the maximum correlation with the entity W.
  • the extended information generated based on the maximum associated entity of entity W can be regarded as valid and accurate extended information of entity W, so the accuracy and accuracy of information extended can be improved, and invalid and inaccurate information can be avoided.
  • determining the correlation between each object entity and the target sentence includes: for each object entity, determining the sum of the correlation degrees between each entity in the target sentence and the target entity as the object The relevance of the entity to the target sentence.
  • the degree of correlation between any entity in the target sentence and any object entity is: the maximum relationship probability value between any entity in the target sentence and the object entity plus the maximum relationship probability value between the object entity and the entity.
  • the correlation between the target sentence and any object entity F is: the degree of correlation between A and F, the degree of correlation between B and F, and the correlation between C and F
  • the degree of correlation between A and F is: the maximum relationship probability value between A and F plus the maximum relationship probability value between F and A.
  • the degree of correlation between C and F is: the maximum relationship probability value between C and F plus the maximum relationship probability value between F and C.
  • each dimension in the M-dimensional relationship vector is between 0-1, and each value represents the probability of having a corresponding relationship, and the sum of the values of all dimensions is 1.
  • the knowledge map (tensor X") represents the possibility of a certain relationship in the relationship table G between any two entities in the entity table E.
  • X jik corresponds to the j-th entity e j and the i-th entity e i in the entity table E, and the probability value of the k-th relationship r k in the relationship table G exists.
  • sentence T intersects entity table E with p entities: e t1 , ... e tp .
  • the correlation between the sentence T and an entity e i in the entity table E is measured by the normalized result of the sum of the correlation degrees between the entities e t1 , ... e tp and the entity e i , that is, the relationship between T and e i
  • the relevance score is:
  • N is the number of entities contained in the entity table E
  • p is the number of entities contained in both the entity table E and the sentence T.
  • the correlation score function Y and the knowledge map (tensor X) to select the information (relationship between entities) inserted into the sentence T.
  • the information represents the knowledge that there is a certain relationship between two entities, for example: "There is a relationship r k between entity e i and entity e j " is a piece of information.
  • the value of information for a certain sentence is positively related to the possibility of the existence of the information and the correlation between the information and the sentence.
  • supplementary information is selected for each entity.
  • the selection method of supplementary information is as follows: For an entity eq(q ⁇ t1, ... ... , tp ⁇ ), get the probability X qi1 , ...X qiM , for the M relationship probabilities between entity e q and entity e i (i ⁇ q), find the maximum value, and obtain the maximum relationship probability between e q and e i .
  • the N-1 information evaluation values can specifically be: V 1 to V N-1 are: the value of the corresponding entity and information for the sentence T.
  • the BERT model can achieve better processing results when processing sentences with information augmentation.
  • the value of the insertion information to the sentence is fully considered to ensure that the insertion information quality, reducing the introduction of noise.
  • determining the maximum relationship probability value between the target object and each object entity includes: generating an N ⁇ N ⁇ M dimensional tensor used to represent the relationship between each entity in the preset entity set and the relationship probability value; M is the dimension of the relationship vector between different entities in the preset entity set; a knowledge map is generated based on the N ⁇ N ⁇ M dimensional tensor, and the maximum relationship probability value between the target object and each object entity is queried in the knowledge map. It can be seen that the maximum relationship probability value between each entity and each object entity can be queried in the knowledge graph.
  • the knowledge map is generated based on N ⁇ N ⁇ M dimensional tensors, and the generation process of N ⁇ N ⁇ M dimensional tensors can refer to the relevant introduction above.
  • the extended information of any entity in the target sentence and the attention score of the entity are adjusted to a target value;
  • the target value is the extended information and
  • Y is the value of the corresponding position in the N ⁇ N ⁇ M dimensional tensor; N ⁇ N ⁇ M dimensional tensor
  • the quantity is used to represent the relationship and the probability value of the relationship among the entities in the preset entity set.
  • determining each entity in the target sentence includes: converting each word in the target sentence into a 1024-dimensional vector to obtain a vector set; inputting the vector set into the entity recognition model, so that the entity recognition model recognizes Entities in the target statement.
  • the entity recognition model can be referred to in FIG. 3 .
  • the entity recognition model shown in Figure 3 is implemented based on Transformer, which includes 6 multi-head self-attention modules. Of course, the entity recognition model can also be implemented based on other structures. "Position coding" among Fig. 2 and Fig. 3 is used for recording: the position of each word in the sentence in the sentence.
  • the recognition result output by the entity recognition model may be: Beijing is the capital of China (AABAABBB).
  • A means entity and B means non-entity.
  • the following embodiments follow the steps of data preparation, entity recognition, relationship recognition, knowledge map building, information embedding, and attention score adjustment to add the expertise in the knowledge map to the input data of the BERT model and adjust the attention score , so that the BERT model improves the accuracy of natural language processing tasks.
  • the word list W also contains two special symbols ⁇ and ⁇ .
  • the word list W can be used to determine the relational vocabulary between two entities.
  • the specific entity recognition model can be shown in FIG. 3 . Sentences are randomly selected from the sentence library for manual labeling to obtain training text, and the model is trained with the labeled text to obtain the entity recognition model.
  • the specific training process can refer to existing related technologies, and will not be repeated here.
  • each word contained in the word table W into a Z-dimensional one-hot vector, and then map the Z-dimensional one-hot vector to a 1024-dimensional vector through the matrix J ⁇ R 1024 ⁇ Z .
  • R is a set of preset real numbers. In this way, each word can be represented by a 1024-dimensional vector.
  • each word in the sentence is represented by a 1024-dimensional vector, then a set of 1024-dimensional vectors can be obtained. Input this set into the model of entity recognition, each entity in the sentence can be recognized.
  • the entities in each sentence in the sentence base can be identified by using the entity recognition model. Then, for each sentence, the relationship between the entities in it can be further identified.
  • Relational table G containing all the relations we are interested in. There are M different relations in the relational table G, one of which is defined as no relation, which means that there is no relation between two entities.
  • the relationship table G is customized according to requirements, and each relationship corresponds to an M-dimensional one-hot vector. It is stipulated that the k-th relationship in the M-dimensional one-hot vector is denoted as r k , corresponding to the k-th dimension of the vector.
  • a method for replacing entities in a sentence includes: for a sentence, after determining all the entities in the sentence, sequentially replace two adjacent entities in the sentence.
  • first traverse the sentence word by word replace all words corresponding to the first entity in the sentence with ⁇ , replace all words corresponding to the second entity in the sentence with ⁇ , and obtain a clause (that is, a new sentence ); start from the second entity in the sentence, replace all the words corresponding to the second entity in the sentence with ⁇ , replace all the words corresponding to the third entity with ⁇ , and get the second clause...and so on until the sentence All entities in are replaced.
  • the hypothetical sentence is: Company A managed by Xiao Ming operates a clothing business.
  • the vector h is then input to the relationship classification neural network for relationship classification.
  • the output vector is (p 1 , p 2 , ..., p M )
  • the first dimension, the second dimension, ..., and the Mth dimension correspond to r 1 , r 2 , ..., r M
  • the probability of relationship r 1 between ⁇ and ⁇ in the input clause is p 1
  • the probability of relationship r 2 is p 2
  • the probability of relationship r M is p M .
  • the relationship between ⁇ and ⁇ is equivalent to the relationship between the two entities replaced in the original sentence.
  • Entity table E contains a total of N entities. Regulations: e i is the i-th entity in E, 1 ⁇ i ⁇ N.
  • the relationship recognition model uses the relationship recognition model to identify the relationship between two adjacent entities in each sentence and build a knowledge graph.
  • the knowledge map is expressed quantitatively by N ⁇ N ⁇ M dimensional tensor. From the tensor, it can be seen whether there is any relationship contained in the relationship table G between any two entities contained in the entity table E. N is the number of entities contained in the entity table E, and M is the number of relationships contained in the relational table G.
  • Use X ijk to represent an element at a position in the tensor, which corresponds to two entities in the entity table E and a relationship in the relationship table G.
  • the corresponding rules are: X ijk corresponds to the i-th entity e i and the j-th entity in the entity table E entity e j , and the kth relation r k in the relational table G.
  • the knowledge map construction process includes:
  • the predetermined value ⁇ is the second threshold.
  • X is continuously updated accordingly until all M-dimensional vectors are processed, then the final X is output.
  • step (3) Optimize the final X obtained in step (2) based on tensor decomposition to supplement potential information for the knowledge graph.
  • the X obtained through step (2) is usually sparse.
  • tensor decomposition is used to infer the potential relationship in X, and the potential relationship is mined, thereby supplementing and perfecting the knowledge (relationship between entities) in the knowledge graph.
  • the value at the position of 0 in X is inferred by using the element of 1 in X, that is, the possibility of existence of uncertain relationship is inferred from the existing definite relationship.
  • the relationship: (Xiao Ming, management, Company A), (Xiao Ma, management, Company B), (Company A, operation, clothing business), (Company B, operation, WeChat), (Xiao Ming, management , clothing business) it can be deduced that: (Little Horse, Management, WeChat) has a higher probability of being established, and the knowledge map can be supplemented accordingly.
  • the specific implementation steps include: splitting the tensor X into M N ⁇ N-dimensional matrices X 1 , ..., X M , and constructing an N ⁇ d-dimensional matrix A and M d ⁇ d-dimensional matrices O 1 , ... , O M , to decompose X 1 , ..., X M into AO 1 A T , ..., AO M A T .
  • the optimal A, the optimal O 1 ,..., OM are obtained, and the new tensor O is obtained by splicing O 1 ,..., OM .
  • each point on the updated tensor X Indicates the possibility that there is a relationship in the relationship table G corresponding to the point between the two entities in the entity table E corresponding to the point, or the credibility of the information that the relationship exists between the two entities.
  • this step inserts the information in the knowledge map into the sentence T, and changes the impact of the inserted words on the original words in the sentence T by modifying the attention score.
  • entity table E For an entity u in sentence T, look it up in entity table E. If u does not exist in the entity table E, no information expansion will be performed on the entity; if the entity u exists in the entity table E (assuming that u is the i-th entity e i in E), then search for information related to e in the knowledge graph The information about i , that is: find the matrix related to e i in the optimized tensor X".
  • the optimized tensor X" is sliced, and the N ⁇ M dimensional slice matrix X" i is taken.
  • This matrix X" i represents: the relationship between the i-th e i in the entity table E and any other entity in E The probability that any relation in the relational table G exists.
  • the relationship information provided by s points is added to the input sentence T through the following operations: according to the entity table E, the relationship table G, and the word table W It is converted into the corresponding word order and inserted after the position of entity u in the input sentence T.
  • this embodiment modifies the self-attention layer of the BERT model according to the following steps.
  • lnY is added to the attention score between each word in the original entity and the added information about the entity; the added information is not related to the added information in the original sentence Add negative infinity to the attention score between words.
  • Xiao Ming is a Chinese who founded Company A to operate the clothing business.
  • the self-attention layer will calculate the attention score between pairs of 16 words in the sentence, After its calculation, between each pair of words "Xiao Ming" and "is a Chinese” (that is: horse and yes, cloud and yes, horse and middle, man and middle, horse and country, man and country, horse and Add ln1 to the attention score of people, cloud and people); add ln0.98 to the attention score of each pair of words between "Company A” and "operating clothing business”; add ln0.98 to the attention score between "is a Chinese” and "establish Add - ⁇ to the attention score between each pair of words
  • the relationship credibility is: the relationship probability value.
  • this embodiment can identify the entities in the sentence based on the entity recognition model, and use the relationship recognition model to identify the relationship between two adjacent entities in the sentence, and construct and optimize the knowledge map accordingly, and obtain a Zhang that represents the relationship between any two entities. quantity.
  • the self-attention layer of the BERT model by modifying the self-attention layer of the BERT model, the influence of inserted information on the original input sentence is changed, and the interaction between irrelevant information is shielded.
  • a natural language processing device provided in an embodiment of the present application is introduced below, and a natural language processing device described below and a natural language processing method described above may refer to each other.
  • the embodiment of the present application discloses a natural language processing device, including:
  • An acquisition module 401 configured to acquire the target sentence to be processed, and determine each entity in the target sentence;
  • the expansion module 402 is configured to, for each entity in the target sentence, if the entity exists in the preset entity set, determine extended information for the entity, and add the determined extended information to the entity in the target sentence After the position, get the updated target sentence;
  • the processing module 403 is used to input the updated target sentence into the BERT model, so that the BERT model performs natural language processing tasks; wherein, during the process of performing natural language processing tasks by the BERT model, the expansion of any entity in the target sentence The attention scores for information and other entities in the target sentence are adjusted to zero.
  • the expansion module 402 is specifically configured to: for each entity in the target sentence, in response to the existence of the entity in the preset entity set, determine expansion information for the entity, and add the determined expansion information to After the position of the entity in the target sentence, the updated target sentence is obtained.
  • the expansion module includes:
  • the first determining unit is configured to use the entity as a target object, and determine an entity group that has a relationship with the target object in the preset entity set;
  • the selection unit is configured to select entities whose relationship probability values are greater than a first threshold in the entity group, and generate extended information of the target object based on the selected entities.
  • the first determination unit includes:
  • N is the number of entities included in the preset entity set
  • M is the preset entity set The number of relationships between different entities
  • the query subunit is used to generate a knowledge map based on N ⁇ N ⁇ M dimensional tensors, and query entity groups that have a relationship with the target object in the knowledge map.
  • the generating subunit is specifically used for:
  • Two adjacent entities in the sentence to be recognized are used as entity groups to obtain multiple entity groups;
  • relationship recognition model to identify the relationship between two entities in each entity group to obtain a plurality of M-dimensional relationship vectors
  • the element corresponding to the maximum value in the initial tensor is updated from 0 to 1 to update the initial tensor;
  • the generating subunit is specifically used for:
  • the generation subunit updates the element corresponding to the maximum value in the initial tensor from 0 to 1 in response to the maximum value in any M-dimensional relationship vector being greater than a preset threshold , to update the initial tensor.
  • the generating subunit is specifically used for:
  • a new three-dimensional matrix is obtained based on the optimal A ' and M optimal O i ' ;
  • the initial three-dimensional matrix and the new three-dimensional matrix are compared bit by bit, and the maximum value of each position is retained to obtain an N ⁇ N ⁇ M dimensional tensor.
  • the expansion module includes:
  • the second determining unit is configured to use the entity as the target object, and determine the object entity having the greatest correlation with the target object in the preset entity set; the target entity is an entity other than the entity in the preset entity set;
  • a generating unit configured to generate extended information of the target object based on the object entity having the greatest correlation with the target object.
  • the second determination unit includes:
  • the first determining subunit is used to determine the maximum relationship probability value between the target object and each object entity, and obtain N-1 maximum relationship probability values; N-1 is the number of object entities, and N is included in the preset entity set number of entities;
  • the second determining subunit is used to determine the correlation between each object entity and the target sentence, and obtain N-1 correlations;
  • Calculation sub-unit for each object entity, calculates the correlation corresponding to this object entity and the product of the maximum relationship probability value corresponding to this object entity, obtains the association score corresponding to this object entity, obtains N-1 association scores;
  • the selection subunit is used to use the object entity corresponding to the maximum association score among the N-1 association scores as the object entity having the maximum association with the target object.
  • the second determining subunit is specifically used for:
  • the sum of the degree of correlation between each entity in the target sentence and the object entity is determined as the correlation between the object entity and the target sentence.
  • the degree of correlation between any entity in the target sentence and any object entity is: the maximum relationship probability value between any entity in the target sentence and the object entity plus the relationship between the object entity and the entity Maximum relationship probability value.
  • the extended information of any entity in the target sentence and the attention score of the entity are adjusted to a target value;
  • the target value is the extended information and The sum of the entity's attention score and lnY;
  • Y is the value of the corresponding position in the N ⁇ N ⁇ M dimensional tensor;
  • the N ⁇ N ⁇ M dimensional tensor is used to represent the relationship and relationship probability value between entities in the preset entity set.
  • this embodiment provides a natural language processing device, which can shield the interaction between irrelevant information, thereby effectively reducing the negative impact of extended information on the original input data, enabling the BERT model to improve the processing accuracy of natural language processing tasks, and improving The processing efficiency and processing effect of the BERT model.
  • An electronic device provided by an embodiment of the present application is introduced below, and the electronic device described below and the natural language processing method and apparatus described above may refer to each other.
  • an electronic device including:
  • memory 501 for storing computer readable instructions
  • the processor 502 is configured to execute computer-readable instructions, so as to implement the methods disclosed in any of the foregoing embodiments.
  • the embodiment of the present application provides one or more non-volatile computer-readable storage media storing computer-readable instructions for introduction, and one or more non-volatile computer-readable storage media storing computer-readable instructions described below
  • the storage medium and the natural language processing method, device and equipment described above may refer to each other.
  • One or more non-volatile computer-readable storage media storing computer-readable instructions, when the computer-readable instructions are executed by one or more processors, one or more processors are executed to perform any of the preceding embodiments. Steps of a natural language processing method. Regarding the specific steps of the method, reference may be made to the corresponding content disclosed in the foregoing embodiments, and details are not repeated here.
  • RAM random access memory
  • ROM read-only memory
  • EEPROM electrically programmable ROM
  • EEPROM electrically erasable programmable ROM
  • registers hard disk, removable disk, CD-ROM, or any other Any other known readable storage medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Animal Behavior & Ethology (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

一种自然语言处理方法、装置、设备及可读存储介质。方法包括:获取待处理的目标语句,并确定目标语句中的各实体(S101);针对目标语句中的每个实体,响应于实体存在于预设实体集中,为实体确定扩充信息,并将所确定的扩充信息添加至实体在目标语句中所处位置之后,得到更新后的目标语句(S102);将更新后的目标语句输入BERT模型,以使BERT模型执行自然语言处理任务;其中,在BERT模型执行自然语言处理任务的过程中,将目标语句中的任一实体的扩充信息与目标语句中的其他实体的注意力得分调整为零(S103)。

Description

一种自然语言处理方法、装置、设备及可读存储介质
相关申请的交叉引用
本申请要求于2022年1月5日提交中国专利局,申请号为CN202210002872.6,申请名称为“一种自然语言处理方法、装置、设备及可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,特别涉及一种自然语言处理方法、装置、设备及可读存储介质。
背景技术
目前可以为BERT(Bidirectional Encoder Representations from Transformer,基于转换器的双向编码器表示)模型的输入数据扩充相关信息,但扩充的信息会影响模型对输入数据的判别,导致处理结果精度下降。当前以各种策略来提高扩充信息的选择精度,但这样仍不能有效降低扩充信息对原输入数据的消极影响。
发明内容
在第一方面,本申请实施例提供了一种自然语言处理方法,包括:
获取待处理的目标语句,并确定目标语句中的各实体;
针对目标语句中的每个实体,响应于该实体存在于预设实体集中,为该实体确定扩充信息,并将所确定的扩充信息添加至该实体在目标语句中所处位置之后,得到更新后的目标语句;以及
将更新后的目标语句输入BERT模型,以使BERT模型执行自然语言处理任务;
其中,在BERT模型执行自然语言处理任务的过程中,将目标语句中的任一实体的扩充信息与目标语句中的其他实体的注意力得分调整为零。
在一些实施例中,为该实体确定扩充信息,包括:
将该实体作为目标对象,并在预设实体集中确定与目标对象存在关系的实体组;以及
在实体组中选择关系概率值大于第一阈值的实体,并基于所选择的实体生成目标对象的扩充信息。
在一些实施例中,在预设实体集中确定与目标对象存在关系的实体组,包括:
生成用于表示预设实体集中各实体间关系及关系概率值的N×N×M维张量;N为预设实体集包括的实体个数,M为预设实体集中不同实体间的关系个数;以及
基于N×N×M维张量生成知识图谱,并在知识图谱中查询与目标对象存在关系的实体组。
在一些实施例中,生成用于表示预设实体集中各实体间关系及关系概率值的N×N×M维张 量,包括:
生成由N×N×M维全0构成的初始张量;
获取用于生成预设实体集的语句库,并遍历语句库中的每个句子,将遍历到的句子作为待识别句子;
将待识别句子中相邻的两个实体作为实体组,得到多个实体组;
利用关系识别模型识别每个实体组中的两个实体间的关系,得到多个M维关系向量;
针对每个M维关系向量,响应于任一M维关系向量中的最大数值大于第二阈值,将最大数值在初始张量中对应位置的元素由0更新为1,以更新初始张量;以及
遍历语句库中的下一个句子,并继续更新当前张量,直至语句库中的每个句子均已被遍历,则输出并优化当前得到的张量,以得到N×N×M维张量。
在一些实施例中,利用关系识别模型识别每个实体组中的两个实体间的关系,得到多个M维关系向量,包括:
针对任一实体组中的两个实体,将待识别句子中的这两个实体用不同标识符进行替换,将替换得到的新句子输入关系识别模型,以使关系识别模型输出这两个实体对应的M维关系向量。
在一些实施例中,优化当前得到的张量,以得到N×N×M维张量,包括:
将当前得到的张量作为初始三维矩阵,将初始三维矩阵分解为M个N×N维的矩阵Xi;i=1,2,……,M;
将初始化得到的d×d×M维张量O分解为M个d×d维的矩阵Oi;d为可调超参数;
初始化得到N×d维的矩阵A,并基于Xi=AOiAT和梯度下降法求解得到最优A’和M个最优Oi’;
基于最优A’和M个最优Oi’得到新三维矩阵;以及
基于max函数逐位对比初始三维矩阵和新三维矩阵,并保留各位置的最大值,得到N×N×M维张量。
在一些实施例中,关系识别模型包括转换器结构的子模型和关系分类神经网络;将替换得到的新句子输入关系识别模型,以使关系识别模型输出这两个实体对应的M维关系向量,包括:
将新句子输入转换器结构的子模型,获得带有两个实体的标识符的特征向量;以及
将带有两个实体的标识符的特征向量输入关系分类神经网络,获得两个实体对应的M维关系向量。
在一些实施例中,为该实体确定扩充信息,包括:
将该实体作为目标对象,并在预设实体集中确定与目标对象具有最大关联的对象实体;对象实体为预设实体集中除该实体外的其他实体;以及
基于与目标对象具有最大关联的对象实体生成目标对象的扩充信息。
在一些实施例中,在预设实体集中确定与目标对象具有最大关联的对象实体,包括:
确定目标对象与每个对象实体的最大关系概率值,得到N-1个最大关系概率值;N-1为对象实体的个数,N为预设实体集包括的实体个数;
确定每个对象实体与目标语句的相关性,得到N-1个相关性;
针对每个对象实体,计算该对象实体对应的相关性和该对象实体对应的最大关系概率值的乘积,得到该对象实体对应的关联得分,得到N-1个关联得分;以及
将N-1个关联得分中的最大关联得分对应的对象实体,作为与目标对象具有最大关联的对象实体。
在一些实施例中,确定每个对象实体与目标语句的相关性,包括:
针对每个对象实体,将目标语句中的每个实体与该对象实体的相关程度之和,确定为该对象实体与目标语句的相关性。
在一些实施例中,目标语句中的任一个实体与任一个对象实体的相关程度为:目标语句中的任一个实体与该对象实体的最大关系概率值加上该对象实体与该实体的最大关系概率值。
在一些实施例中,自然语言处理方法还包括:
在BERT模型执行自然语言处理任务的过程中,将目标语句中的任一实体的扩充信息与该实体的注意力得分调整为目标值;目标值为该扩充信息与该实体的注意力得分与lnY之和;Y为N×N×M维张量中相应位置的数值;N×N×M维张量用于表示预设实体集中各实体间关系及关系概率值。
在一些实施例中,目标语句中的任一实体的扩充信息与其他实体的注意力得分包括:该实体的扩充信息中的每个字与其他实体中的每个字之间的注意力得分。
在第二方面,本申请实施例提供了一种自然语言处理装置,包括:
获取模块,用于获取待处理的目标语句,并确定目标语句中的各实体;
扩充模块,用于针对目标语句中的每个实体,响应于该实体存在于预设实体集中,为该实体确定扩充信息,并将所确定的扩充信息添加至该实体在目标语句中所处位置之后,得到更新后的目标语句;以及
处理模块,用于将更新后的目标语句输入BERT模型,以使BERT模型执行自然语言处理任务;其中,在BERT模型执行自然语言处理任务的过程中,将目标语句中的任一实体的扩充信息与目标语句中的其他实体的注意力得分调整为零。
在第三方面,本申请实施例提供了一种电子设备,包括:
存储器,用于存储计算机可读指令;
处理器,用于执行计算机可读指令,以实现前述任一实施例中的自然语言处理方法。
在第四方面,本申请实施例提供了一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,其特征在于,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行前述任一实施例中自然语言处理方法的步骤。
附图说明
为了更清楚地说明本申请实施例或相关技术中的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。
图1为根据一个或多个实施例中一种自然语言处理方法流程图;
图2为根据一个或多个实施例中一种关系识别模型示意图;
图3为根据一个或多个实施例中一种实体识别模型示意图;
图4为根据一个或多个实施例中一种自然语言处理装置示意图;
图5为根据一个或多个实施例中一种电子设备示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
目前,可以通过在输入中引入额外知识来实现增强BERT模型的目的。知识图谱因其存储的信息具有结构性,是提取知识的最佳来源。
知识图谱是将人类知识结构化形成的知识系统,包含基本事实、通用规则和其他有关结构化信息,可用于信息检索、推理决策等智能任务。知识图谱所蕴含的丰富信息可协助BERT模型更好的进行自然语言处理任务。通过构建知识图谱,并将知识图谱中的信息添加到BERT模型的输入中作为额外信息,可以辅助BERT进行自然语言处理任务。
但是,一些知识图谱仅关注了文本的表面关系,而忽视了可能存在的潜在关系,且利用将知识图谱给BERT模型的输入添加信息,容易引入过多噪声(相关度低信息),影响效率和效果。
虽然目前可以为BERT模型的输入数据扩充相关信息,但扩充的信息会影响模型对输入数据的判别,导致处理结果精度下降。当前以各种策略来提高扩充信息的选择精度,但这样仍不能有效降低扩充信息对原输入数据的消极影响。为此,本申请提供了一种自然语言处理方案,能够有效降低扩充信息对原输入数据的消极影响。
参见图1所示,本申请实施例公开了一种自然语言处理方法,包括步骤S101、步骤S102和步骤S103。
步骤S101、获取待处理的目标语句,并确定目标语句中的各实体。
其中,本实施例将目标语句中的名词称为实体,目标语句中至少包括一个实体。例如:“小明管理的A公司经营着服装业务”这一语句中,包括3个实体:小明、A公司、服装业务。
步骤S102、针对目标语句中的每个实体,若该实体存在于预设实体集中,则为该实体确定扩充信息,并将所确定的扩充信息添加至该实体在目标语句中所处位置之后,得到更新后的目标语句。
步骤S102可以包括:针对目标语句中的每个实体,响应于该实体存在于预设实体集中,为该实体确定扩充信息,并将所确定的扩充信息添加至该实体在目标语句中所处位置之后,得到更新后的目标语句。
需要说明的是,预设实体集也就是众多实体的集合。若目标语句中的实体存在于预设实体集中,说明可以从预设实体集中找出与该实体有关联的其他实体,那么可据此为该实体确定扩充信 息,并将所确定的扩充信息添加至该实体在目标语句中所处位置之后,从而为该实体添加扩充信息。据此,为目标语句中的、且存在于预设实体集中的每个实体都可以添加扩充信息,从而可得到更新后的目标语句。其中,目标语句中的、不存在于预设实体集中的实体则不添加扩充信息。
假设目标语句为:“小明创立A公司”;为“小明”这一实体添加的扩充信息为:“是中国人”,其中的“中国人”为预设实体集中的实体;为“A公司”这一实体添加的扩充信息为:“经营服装业务”,其中的“服装业务”为预设实体集中的实体;那么最终得到的更新后的目标语句为:“小明是中国人创立A公司经营服装业务”。其中,“是”为:连接“小明”和“中国人”的关系词汇,“经营”为:连接“A公司”和“服装业务”的关系词汇,此类关系词汇可基于语言惯用语确定。
步骤S103、将更新后的目标语句输入BERT模型,以使BERT模型执行自然语言处理任务;其中,在BERT模型执行自然语言处理任务的过程中,将目标语句中的任一实体的扩充信息与目标语句中的其他实体的注意力得分调整为零。
需要说明的是,BERT模型包括自注意力层,该自注意力层会计算句子中不同字之间的注意力得分,本实施例将某一实体的扩充信息与目标语句中的其他实体的注意力得分调整为零,可避免无关信息之间的注意力得分为后续处理过程提供错误信息,因此可降低扩充信息对原输入数据的消极影响。
在本实施例中,将更新后的目标语句作为BERT模型的输入数据,可使BERT模型在执行自然语言处理任务时,能够尽可能获得更多信息,从而可提升处理精度和效果。假设BERT模型处理问答任务,那么当BERT模型获得一个问句后,便可以按照本实施例为该问句中的各实体添加扩充信息,以更新该问句。更新得到新问句后,对新问句进行处理,以为新问句确定最为优选的答案。
并且,本实施例能够为目标语句中的各实体添加扩充信息,从而可为作为BERT模型输入数据的目标语句扩充有效信息,同时,在BERT模型执行自然语言处理任务的过程中,将目标语句中的任一实体的扩充信息与目标语句中的其他实体的注意力得分调整为零,使得:所加入信息与原句中无关信息的注意力得分为零,这样可屏蔽无关信息之间的相互影响,从而有效降低扩充信息对原输入数据的消极影响,使BERT模型提升自然语言处理任务的处理精度,提高BERT模型的处理效率和处理效果。
例如:原目标语句为“小明创立A公司”,更新后的目标语句为:“小明是中国人创立A公司经营服装业务”,其中的“是中国人”是“小明”的扩充信息,其与原句中的“创立A公司”无关,因此将“是中国人”中每个字与“创立A公司”中每个字的注意力得分调整为零,如此可降低“是中国人”对原句中“创立A公司”的消极影响。相应地,“经营服装业务”是“A公司”的扩充信息,其与原句中的“小明”无关,因此将“经营服装业务”中每个字与“小明”中每个字的注意力得分调整为零,如此可降低“经营服装业务”对原句中“小明”的消极影响。
在一种具体实施方式中,为该实体确定扩充信息,包括:将该实体作为目标对象,并在预设实体集中确定与目标对象存在关系的实体组;在实体组中选择关系概率值大于第一阈值的实体,并基于所选择的实体生成目标对象的扩充信息。
在一种具体实施方式中,在预设实体集中确定与目标对象存在关系的实体组,包括:生成用于表示预设实体集中各实体间关系及关系概率值的N×N×M维张量;N为预设实体集包括的实体个数,M为预设实体集中不同实体间的关系个数;基于N×N×M维张量生成知识图谱,并在知识图谱中查询与目标对象存在关系的实体组。其中,N×N×M维张量等价于知识图谱,因此基于N×N×M维张量生成知识图谱只是将张量中的相关信息以知识图谱的方式进行表示。
可见,在知识图谱中可查询得到与目标对象存在关系的实体组。
在一种具体实施方式中,生成用于表示预设实体集中各实体间关系及关系概率值的N×N×M维张量,包括:生成由N×N×M维全0构成的初始张量;获取用于生成预设实体集的语句库,并遍历语句库中的每个句子,将遍历到的句子作为待识别句子;将待识别句子中相邻的两个实体作为实体组,得到多个实体组;利用关系识别模型识别每个实体组中的两个实体间的关系,得到多个M维关系向量;针对每个M维关系向量,若任一M维关系向量中的最大数值大于第二阈值,则将最大数值在初始张量中对应位置的元素由0更新为1,以更新初始张量;遍历语句库中的下一个句子,并继续更新当前张量,直至语句库中的每个句子均已被遍历,则全0的初始张量中的有些位置的元素会变更为1,而有些位置的元素则仍然保留为0,从而得到由0和1构成的张量,同时优化当前得到的、由0和1构成的张量,以得到N×N×M维张量。
其中,针对每个M维关系向量,若任一M维关系向量中的最大数值大于第二阈值,则将最大数值在初始张量中对应位置的元素由0更新为1,以更新初始张量,包括:针对每个M维关系向量,响应于任一M维关系向量中的最大数值大于第二阈值,将最大数值在初始张量中对应位置的元素由0更新为1,以更新初始张量。
在一种具体实施方式中,利用关系识别模型识别每个实体组中的两个实体间的关系,得到多个M维关系向量,包括:针对任一实体组中的两个实体,将待识别句子中的这两个实体用不同标识符进行替换,将替换得到的新句子输入关系识别模型,以使关系识别模型输出这两个实体对应的M维关系向量。
其中,关系识别模型可参照图2,包括一个Transformer(转换器)结构的子模型和一个关系分类神经网络;Transformer结构的子模型包括12个多头注意力模块。当然,关系识别模型也可以是其他结构。
任一个新句子先输入Transformer结构的子模型,得到两个特征向量:mα和mβ,然后mα和mβ输入关系分类神经网络,即可得到α和β之间的M个关系及M个关系概率值,这M个关系及M个关系概率值用M维关系向量表示,向量中的每个元素即一个关系概率值。其中,α和β即:替换相邻两个实体的不同标识符,故α和β之间的M个关系及M个关系概率值也就是:被替换的相邻两个实体之间的M个关系及M个关系概率值。
具体的,用α和β替换相邻实体可参照如下示例。假设待识别句子为:小明管理的A公司经营着服装业务。据此可确定两个实体组:“小明+A公司”、“A公司+服装业务”,那么可得到2个新句子:①α管理的β经营着服装业务网;②小明管理的α经营着β。
需要说明的是,针对新句子①求得的M维关系向量表示:“小明”与“A公司”间存在的关系。如果基于语句“A公司有小明”确定得到子句:α有β。那么针对子句“α有β”求得的M维 关系向量表示:“A公司”与“小明”间存在的关系。可见,相邻两个实体的位置互换后,需要使用关系识别模型再次进行关系识别。
需要说明的是,当两个实体间的关系在张量中呈现为0时,可能存在如下两种情况:(1)这两个实体没有关系;(1)这两个实体有关系但关系缺失。为确定此类情况,本申请在得到由0和1构成的张量后,还对当前得到的、由0和1构成的张量进行了优化。
在一种具体实施方式中,优化当前得到的张量,以得到N×N×M维张量,包括:将当前得到的张量作为初始三维矩阵,将初始三维矩阵分解为M个N×N维的矩阵X i;i=1,2,……,M;将初始化得到的d×d×M维张量O分解为M个d×d维的矩阵O i;d为可调超参数;初始化得到N×d维的矩阵A,并基于X i=AO iA T和梯度下降法求解得到最优A 和M个最优O i ;基于最优A 和M个最优O i 得到新三维矩阵;基于max函数逐位对比初始三维矩阵和新三维矩阵,并保留各位置的最大值,得到N×N×M维张量。
一般地,采用矩阵分解方法可以对矩阵进行变换。由于本申请中的张量是N×N×M的三维矩阵,因此需要先将N×N×M维张量分解为M个N×N维的二维矩阵:X 1……X i……X M,同时将初始化得到的d×d×M维张量O分解为M个d×d维的矩阵:O 1……O i……O M。这样,就可以把每一个X i分解为AO iA T,即:X 1=AO 1A T,X 2=AO 2A T……X M=AO MA T。如此基于梯度下降法可求解得到最优A’和M个最优O i ,也就是:分别计算上述M个等式中的最优A 和最优O i 。之后,再计算X’ 1=A’O 1A’ T,X’ 2=A’O 2A’ T……X’ M=A’O MA’ T,得到M个计算结果:X’ 1……X’ i……X’ M。拼接M个计算结果即可得到新三维矩阵。A’ T为A’的转置矩阵。当然,若得到M个最优O i :O 1……O i……O M,可拼接O 1……O i……O M得到新的d×d×M维张量O,那么基于X=AOA T即可得到新三维矩阵。
之后,逐位对比初始三维矩阵和新三维矩阵中的各个元素,并保留各位置的最大值,从而得到N×N×M维张量。例如:某一位置在初始三维矩阵中的元素为0,而在新三维矩阵中的元素为0.02,那么X”该位置的元素记为0.02。又如:某一位置在初始三维矩阵中的元素为1,而在新三维矩阵中的元素为0.99,那么X”该位置的元素记为1。
在一种具体实施方式中,为该实体确定扩充信息,包括:将该实体作为目标对象,并在预设实体集中确定与目标对象具有最大关联的对象实体;对象实体为预设实体集中除该实体外的其他实体;基于与目标对象具有最大关联的对象实体生成目标对象的扩充信息。
在一种具体实施方式中,在预设实体集中确定与目标对象具有最大关联的对象实体,包括:确定目标对象与每个对象实体的最大关系概率值,得到N-1个最大关系概率值;N-1为对象实体的个数,N为预设实体集包括的实体个数;确定每个对象实体与目标语句的相关性,得到N-1个相关性;针对每个对象实体,计算该对象实体对应的相关性和该对象实体对应的最大关系概率值的乘积,得到该对象实体对应的关联得分,得到N-1个关联得分;将N-1个关联得分中的最大关联得分对应的对象实体,作为与目标对象具有最大关联的对象实体。
假设预设实体集中包括N个实体,N为正整数,那么在目标语句中任一个实体W存在于预设实体集的情况下,该实体W与预设实体集中剩下的N-1个实体都能确定出一个最大关系概率值,从而得到N-1个最大关系概率值。
由于不同实体之间的关系用M维向量表示,故M维向量中有M个关系概率值,而本实施例将其中的最大值确定为目标语句中任一个实体W与预设实体集中剩下的其他实体之间的最大关系概率值。
同时,预设实体集中除实体W外的N-1个实体都可以与目标语句求得一个相关性,从而得到N-1个相关性。
那么,站在N-1个实体中的任一个实体的角度,其对应有一个相关性和一个最大关系概率值,故计算该相关性与该最大关系概率值的乘积,可得到一个关联得分。由于有N-1个实体,故可得N-1个关联得分。这N-1个关联得分中的最大关联得分对应的实体即为:既与实体W关系最大、又与目标语句相关性最大的实体,因此将其作为与实体W具有最大关联的实体。基于实体W的最大关联实体生成的扩充信息可认为是:实体W的有效且准确的扩充信息,因此可以提升信息扩充精度和准确度,避免扩充无效而不准确的信息。
在一种具体实施方式中,确定每个对象实体与目标语句的相关性,包括:针对每个对象实体,将目标语句中的每个实体与该对象实体的相关程度之和,确定为该对象实体与目标语句的相关性。
其中,目标语句中的任一个实体与任一个对象实体的相关程度为:目标语句中的任一个实体与该对象实体的最大关系概率值加上该对象实体与该实体的最大关系概率值。
假设目标语句中包括3个实体:A、B、C,那么该目标语句与任一个对象实体F的相关性为:A与F的相关程度、B与F的相关程度,以及C与F的相关程度之和,求得的和需要进行归一化处理。而A与F的相关程度为:A与F的最大关系概率值加上F与A的最大关系概率值。而B与F的最大关系概率值加上F与B的最大关系概率值。而C与F的相关程度为:C与F的最大关系概率值加上F与C的最大关系概率值。可见,计算任意两个实体间的相关程度,需要在张量中找到与这两个实体有关的两个最大关系概率值,然后对这两个最大关系概率值求和并进行归一化。
其中,M维关系向量中每一维度上数值介于0-1之间,各数值表示具备相应关系的概率,所有维度的数值之和为1。
对于某个即将输入至BERT模型进行自然语言处理任务的句子T,将知识图谱中某些信息插入至句子T中。那么,如何从知识图谱中选择需要插入的信息呢,通过计算实体间相关程度、待插入信息中的实体与句子间相关性,来评估待插入信息对句子的价值,并选择最大价值的信息进行插入。
(1)计算实体表E中任意两实体间的相关程度。
知识图谱(张量X”)表示实体表E中任意两实体间存在关系表G中某种关系的可能性,我们用两实体间存在关系表G中最大关系的可能性之和衡量两实体间相关程度,并通过softmax归一化得到关联程度得分。例如:对于实体表E中第i实体e i和第j个实体e j,两实体的相关程度得分为:
Figure PCTCN2022102862-appb-000001
我们认为两实体间存在的关系数越多,两实体的相关性就越高。其中,X jik对应实体表E中第j个实体e j和第i个实体e i,存在关系表G中第k个关系r k的概率值。
(2)计算待插入信息中的实体与句子间的相关性得分。
我们认为一个句子中所有实体与实体表中某个实体间相关程度越高,则句子与那个实体间相关性就越高,故使用上面得到的实体间相关程度得分计算句子与实体间的相关性得分。
假设句子T与实体表E共有p个实体相交:e t1、……e tp。此时,句子T与实体表E中某一个实体e i的相关性由实体e t1、……e tp与实体e i间相关程度之和的归一化结果衡量,即,T与e i间相关性得分为:
Figure PCTCN2022102862-appb-000002
其中,N为实体表E包含的实体数,p为实体表E和句子T共同包含的实体数。
(3)下面利用相关性得分函数Y和知识图谱(张量X”)选择插入至句子T的信息(实体间关系)。其中,信息表示两实体间存在某一种关系这一知识,例如:“实体e i与实体e j间存在关系r k”为一个信息。信息对于某一句子的价值与该信息存在的可能性、该信息与句子的相关性正相关。
逐字遍历句子T,找到全部同时存在于句子T和实体表E的实体e t1、……e tp,共p个,对应实体表E中第t 1个、……、第t p个实体。接着根据相关性得分函数Y,计算句子T与实体表中每一个实体间的相关性得分Y(1,T)、……、Y(N,T)。
随后针对句子T中的实体e t1、……e tp,为每个实体选择补充信息,补充信息的选择方法如下:对e t1、……e tp中某个实体eq(q∈{t1,……,tp}),从张量X”中得到实体eq与实体表E中任意一个其他实体e i(i≠q)间存在关系表G的全部M种关系中每一种的概率X qi1、……X qiM,对于实体e q和实体e i(i≠q)之间的M种关系概率,求取最大值,得e q与e i的 最大关系概率。
对实体表E中所有与e q不同的N-1个实体做上述相同操作,可以得到e q的N-1个最大关系概率。
假设这N-1个关系概率为:e q与e ji间存在关系表G中第k 1种关系r k1、e q与e j2间存在关系表G中第k 2种关系r k2、……、e q与e jN-1间存在关系表G中第N-1种关系r kN-1。e ji、……、e jN-1为实体表E中除e q外的其他实体。那么,N-1个最大关系概率为:
Figure PCTCN2022102862-appb-000003
接着,将N-1个最大关系概率与T和e i的相关性得分相乘,得到N-1个信息评估值,选择这N-1个信息评估值中的最大值对应实体及信息(实体间关系),在e q后进行信息插入。具体的,通过实体表E、关系表G、字表W将所确定的实体及信息转化为字,并插入到实体e q在句子T中所在位置的后面。
其中,N-1个信息评估值具体可以为:
Figure PCTCN2022102862-appb-000004
Figure PCTCN2022102862-appb-000005
V 1至V N-1即:相应实体及信息对于句子T的价值。
逐个遍历T中实体e t1、……e tp,按照上述步骤为每个实体插入信息,最终得到信息扩充后的句子。BERT模型处理信息扩充后的句子可以获得更好的处理效果。
本实施例在从知识图谱中选择插入信息(实体间关系)时,充分考虑到插入信息对于句子的价值(体现为该信息与句子的相关度和该信息的可信度),确保了插入信息的质量,减少了噪声的引入。
在一种具体实施方式中,确定目标对象与每个对象实体的最大关系概率值,包括:生成用于表示预设实体集中各个实体间关系及关系概率值的N×N×M维张量;M为预设实体集中不同实体间的关系向量的维数;基于N×N×M维张量生成知识图谱,并在知识图谱中查询目标对象与每个对象实体的最大关系概率值。可见,在知识图谱中可查询得到每个实体的与每个对象实体的最大关系概率值。而知识图谱基于N×N×M维张量生成,N×N×M维张量的生成过程可参照上文相关介绍。
在一种具体实施方式中,在BERT模型执行自然语言处理任务的过程中,将目标语句中的任一实体的扩充信息与该实体的注意力得分调整为目标值;目标值为该扩充信息与该实体的注意力得分与lnY之和,从而根据知识图谱中的关系概率值调整加入信息对原句的影响;Y为N×N×M维张量中相应位置的数值;N×N×M维张量用于表示预设实体集中各实体间关系及关系概率值。
在一种具体实施方式中,确定目标语句中的各实体,包括:将目标语句中的每个字转换为1024维向量,得到向量集合;将向量集合输入实体识别模型,以使实体识别模型识别目标语句中 的各实体。
其中,实体识别模型可参见图3。图3所示的实体识别模型基于Transformer实现,其中包括6个多头自注意力模块。当然,实体识别模型也可以基于其他结构实现。图2和图3中的“位置编码”用于记录:句子中的各个字在句子中的位置。
针对“北京是中国的首都”这一语句,实体识别模型输出的识别结果可以是:北京是中国的首都(AABAABBB)。其中,A表示实体,B表示非实体。
下述实施例按照数据准备、实体识别、关系识别、构建知识图谱、信息嵌入、注意力得分调整几个步骤,将知识图谱中的专业知识添加到BERT模型的输入数据中,并调整注意力得分,使BERT模型提升自然语言处理任务的精度。
1、数据准备。
构建包括众多语句的语句库,对语句库中的全部文本,构建包含全部字的字表W。此外,字表W中还包含2个特殊标记符号α和β。该字表W可用于确定两个实体间的关系词汇。
2、实体识别。
构建可对文本进行实体识别的模型,得到实体识别模型,具体的实体识别模型可如图3所示。从语句库中随机抽取句子进行人工标注,可得到训练文本,用有标注的文本对模型进行训练,即可得到实体识别模型,具体的训练过程可参照现有相关技术,在此不再赘述。
将字表W包含的每个字转换为一个Z维的one-hot向量,然后通过矩阵J∈R 1024×Z将Z维one-hot向量映射为1024维向量。R为预设实数集合。这样,每个字都可以用一个1024维向量表示。
针对语句库中的某一句子,将句子中的各个字都用1024维向量表示,那么可得1024维向量的集合。将该集合输入实体识别的模型,皆可识别出该句中的各个实体。
3、关系识别。
利用实体识别模型可识别出语句库中每个句子中的实体。那么针对每个句子,可进一步识别其中实体的关系。
首先构建包含所有我们感兴趣的关系的关系表G,关系表G中有M种不同的关系,其中一种关系被定义为无关系,表示两实体间不存在关系。关系表G根据需求自定义,其中每种关系对应一个M维one-hot向量。规定:M维one-hot向量中第k种关系为记为r k,对应向量的第k维。
其次对一句话中的两个相邻实体用α和β进行替换。一种对句子的实体进行替换的方法包括:对于一个句子,确定其中的所有实体后,依次顺序替换句子中相邻2个实体。
具体的,首先逐字遍历该句子,将句子中的第一个实体对应的全部字替换为α,将句中第二个实体对应的全部字替换为β,得到一个子句(即一个新句子);从句中第二个实体开始,将句中第二个实体对应的全部字替换为α,将第三个实体对应的全部字替换为β,得到第二个子句……如此重复,直至句中所有实体均被替换。假设句子为:小明管理的A公司经营着服装业务。据此可确定两个实体组:“小明+A公司”、“A公司+服装业务”,那么可得到2个子句:①α管理的β经营着服装业务网;②小明管理的α经营着β。
构建关系识别模型。从语句库中随机抽取一小部分句子构成训练集,将每个句子进行实体替换,对于包含q个实体的句子,替换后可得q-1个子句;对每个子句,根据句意判断α、β间关 系,在关系表G中找到对应进行人工标注,每个子句对应的标签为表示α、β间关系的M维one-hot向量。利用adam算法最小化logits回归损失函数训练模型,即可得到关系识别模型,具体的关系识别模型可如图2所示。关系识别模型的具体训练过程可参照现有相关技术,在此不再赘述。
由于针对一个句子,可得到至少一个子句,将各个子句分别输入关系识别模型,可输出该句子中各组相邻实体之间的关系,每两个相邻实体之间的关系用M维向量表示。
具体的,实体替换后得到的子句被输入至图2所示的关系识别模型,从Transformer可输出与α、β对应的两向量mα、mβ,将这两个向量拼接得到向量h=<hα|hβ>,此时向量h包含两实体的上下文信息。向量h随后被输入至关系分类神经网络进行关系分类,该网络输出层为softmax归一化层,可输出一个M维向量v=(p 1,p 2,……,p M),其中每个维度上的值表示输入的子句中α、β间关系为G中对应关系的概率。
例如,若输出向量为(p 1,p 2,……,p M),那么第1维、第2维、……、第M维分别对应关系r 1、r 2、……、r M,则可认为输入的子句中α、β间存在关系r 1的概率为p 1,存在关系r 2的概率为p 2,……,存在关系r M的概率为p M。又因为实体间关系由句子整体语义决定,与实体本身无关,故α、β间关系等价于原句中被替换的两实体间关系。
4、构建知识图谱。
利用上述实体识别模型识别语句库中的所有实体,并去除重复实体,得到实体表E(即预设实体库)。实体表E共包含N个实体。规定:e i为E中第i个实体,1≤i≤N。
利用关系识别模型识别每句话中相邻两实体之间的关系并构建知识图谱。知识图谱用N×N×M维张量量化表示,从张量中可看出:实体表E所包含的任意两实体间是否存在关系表G中所包含的任意一种关系。N为实体表E包含的实体数,M为关系表G包含的关系数。用X ijk表示张量中的一个位置的元素,其对应实体表E中两个实体和关系表G中的一种关系,对应规则为:X ijk对应实体表E中第i个实体e i和第j个实体e j,以及关系表G中第k个关系r k
具体的,知识图谱构建过程包括:
(1)构建初始的N×N×M维张量X,此时X中全部值初始化为0。
(2)利用关系识别模型识别语句库中每句话中相邻两实体之间的关系,得到众多M维向量。针对任一个M维向量,找出该向量中数值最大的一维,根据该维度所对应的关系和子句中α、β所替换的实体更新初始的N×N×M维张量X。
例如,假设对于某个子句,输出的M维关系向量为v=(p 1,p 2,……,p M),其中,第k维对应数值p k最大,该维度对应关系表G中第k个关系r k(1≤k≤M)。又假设该子句中被α、β替换的两实体对应实体表E中第i个实体e i和第j个实体e j(1≤i,j≤N)。那么,当且仅当p k大于预先规定值θ(0<θ<1)时,认为实体e i、e j间存在关系r k,此种情况下将初始X中的X ijk=1;否则,初始X中的X ijk=0。其中,规定值θ即第二阈值。
针对所有M维向量,都据此持续更新X,直至所有M维向量都被处理,那么输出最终得到的X。
(3)基于张量分解对步骤(2)最终得到的X进行优化,以为知识图谱补充潜在信息。
通过步骤(2)获得的X通常是稀疏的,本步骤利用张量分解对X中潜在关系进行推理,挖掘潜在关系,从而补充完善了知识图谱中知识(实体间关系)。
对于步骤(2)获得的X,若X中某一点X ijk=1,则表示实体表E中第i个实体e i与实体表中第j个实体e j存在关系表G中的第k种关系r k;若X ijk=0,则无法确定该关系不存在还是记录该关系的信息缺失。
因此,本步骤利用X中为1的元素推断X中原本为0的位置处的值,即通过已有确定关系推断不确定关系存在的可能。例如,假设我们已知关系:(小明,管理,A公司)、(小马,管理,B公司)、(A公司,经营,服装业务)、(B公司,经营,微信)、(小明,管理,服装业务),可以推理出:(小马,管理,微信)成立的概率较大,据此可补充知识图谱。
下面详细介绍利用张量分解补充X的方法,包括:将张量X分解为X≈AOA T(A为N×d维矩阵,O为d×d×M维张量,其中d为可调超参数,A T为A的转置矩阵),利用梯度下降法更新A、O,待损失函数收敛到足够小之后,根据张量X’=AOA T更新X。
具体实施步骤包括:将张量X拆分为M个N×N维矩阵X 1、……、X M,同时构造一个N×d维矩阵A和M个d×d维矩阵O 1、……、O M,以将X 1、……、X M分解为AO 1A T、……、AO MA T
随机初始化矩阵A、O 1、……、O M后,利用梯度下降法最小化损失函数
Figure PCTCN2022102862-appb-000006
则有:
Figure PCTCN2022102862-appb-000007
据此,在每轮循环中更新A和O 1、……、O M
Figure PCTCN2022102862-appb-000008
η为学习速率,i=1,……,M。
多次循环损失函数收敛到足够小之后,得到最优A、最优O 1、……、O M,将O 1、……、O M拼接得到新的张量O。
之后,计算张量X’=AOA T,并将X’与X逐个位置取最大值,即最终的X” ijk=max(X ijk’,X ijk)。更新后张量X上每一点的值表示与该点对应的实体表E中两个实体间存在与该点对应的关系表G中关系的可能性,或称为两实体存在该关系这一信息的可信度。例如:X ijk对应实体表E中第i个实体e i和第j个实体e j以及关系表G中第k个关系r k,则X ijk=0.8表示实体e i与实体e j间存在关系r k的概率为0.8。
最后,基于最终得到的X”生成知识图谱。
5、信息嵌入和注意力得分调整。
对于某个即将输入至BERT模型进行自然语言处理任务的句子T,本步骤将知识图谱中的信息插入至句子T中,并通过修改注意力得分改变插入字对句子T中原有字的影响。
(1)利用实体识别模型对输入句子T进行实体识别,得到其中全部实体u1,……,ut,接着针对所有实体依次进行信息扩充。
针对句中某一个实体进行信息扩充的操作如下:
对于句子T中某个实体u,在实体表E中查找。若u不存在于实体表E中,则不对该实体进行信息扩充;若实体u存在于实体表E中(这里假设u为E中第i个实体e i),则在知识图谱中寻找与e i有关信息,即:在优化后的张量X”中寻找与e i有关的矩阵。
具体的,对优化后的张量X”进行切片,取其中的N×M维切片矩阵X” i,此矩阵X” i表示:实体表E中第i个e i与E中其他任意实体间存在关系表G中任意关系的概率。
在矩阵X” i中寻找所有值大于ψ的点(0<ψ<1,为规定值,即第一阈值)。
假设X”中有s个点(i,j 1,k 1),......,(i,j s,k s)满足要求
Figure PCTCN2022102862-appb-000009
则我们知道e i与e j1之间存在关系
Figure PCTCN2022102862-appb-000010
的概率为
Figure PCTCN2022102862-appb-000011
与e js间存在关系
Figure PCTCN2022102862-appb-000012
的概率为
Figure PCTCN2022102862-appb-000013
接着,通过如下操作将s个点提供的关系信息加入到输入句T中:根据实体表E、关系表G、字表W将
Figure PCTCN2022102862-appb-000014
转化为对应的字顺序插入至输入句T中实体u所在位置之后。
对句中全部实体u1,……,ut逐个进行上述操作,可以得到信息扩充后的句子T’。
然而,将T’直接输入至BERT模型,会有以下两种不利影响:①加入的字与原句T中的部分字之间无关系,如果两无关的字因自注意力机制相互影响,可能影响输出。②加入的信息不一定可靠,如果添加的不可靠信息对原输入句T影响过大,也可能影响结果的准确性。
为解决上述两种不利影响,本实施例按以下步骤修改BERT模型的自注意力层。
在BERT模型的自注意力层之中,在原实体中各字与加入的关于该实体信息各字之间的注意力得分上加上lnY;在加入信息与原句中与该加入信息不相关的字之间的注意力得分上加上负无穷。
例如:假设原始输入句子为:小明创立A公司。按照如上操作进行信息扩充后得到句子为:小明是中国人创立A公司经营服装业务。
由张量X”可知:“小明是中国人”这一信息的关系可信度为1(即X”中相应位置的元素值),“A公司经营服装业务”的关系可信度为0.98(即X”中相应位置的元素值),则通过如下操作,修改BERT模型的自注意力层输出的注意力得分:自注意力层会计算句中16个字两两之间的注意力得分,在其计算后,在“小明”与“是中国人”每一对字之间(即:马与是,云与是,马与中,人与中,马与国,人与国,马与人,云与人)的注意力得分上加ln1;在“A公司”与“经营服装业务”每一对字之间的注意力得分上加ln0.98;在“是中国人”与“创立A公司”每一对字之间的注意力得分上加-∞;在“小明是中国人创立”与“经营服装业务”每一对字之间的注意力得分上加-∞。
通过上述修改自注意力层的方法,可以屏蔽无关信息间的相互影响,并根据知识图谱中的关系可信度调整加入信息对原句的影响,降低低关系可信度信息对原输入的影响。关系可信度即:关系概率值。
可见,本实施例可基于实体识别模型识别句子中的实体,并利用关系识别模型识别句子中相邻两实体间的关系,并据此构建并优化知识图谱,得到表示任意两实体间关系的张量。同时,通过修改BERT模型自注意力层改变插入信息对于原输入句的影响,屏蔽无关信息之间的相互影响。
下面对本申请实施例提供的一种自然语言处理装置进行介绍,下文描述的一种自然语言处理装置与上文描述的一种自然语言处理方法可以相互参照。
参见图4所示,本申请实施例公开了一种自然语言处理装置,包括:
获取模块401,用于获取待处理的目标语句,并确定目标语句中的各实体;
扩充模块402,用于针对目标语句中的每个实体,若该实体存在于预设实体集中,则为该实体确定扩充信息,并将所确定的扩充信息添加至该实体在目标语句中所处位置之后,得到更新后的目标语句;
处理模块403,用于将更新后的目标语句输入BERT模型,以使BERT模型执行自然语言处理任务;其中,在BERT模型执行自然语言处理任务的过程中,将目标语句中的任一实体的扩充信息与目标语句中的其他实体的注意力得分调整为零。
在一些具体实施方式中,扩充模块402具体用于:针对目标语句中的每个实体,响应于该实体存在于预设实体集中,为该实体确定扩充信息,并将所确定的扩充信息添加至该实体在目标语句中所处位置之后,得到更新后的目标语句。
在一种具体实施方式中,扩充模块包括:
第一确定单元,用于将该实体作为目标对象,并在预设实体集中确定与目标对象存在关系的实体组;
选择单元,用于在实体组中选择关系概率值大于第一阈值的实体,并基于所选择的实体生成目标对象的扩充信息。
在一种具体实施方式中,第一确定单元包括:
生成子单元,用于生成用于表示预设实体集中各实体间关系及关系概率值的N×N×M维张量;N为预设实体集包括的实体个数,M为预设实体集中不同实体间的关系个数;
查询子单元,用于基于N×N×M维张量生成知识图谱,并在知识图谱中查询与目标对象存在关系的实体组。
在一种具体实施方式中,生成子单元具体用于:
生成由N×N×M维全0构成的初始张量;
获取用于生成预设实体集的语句库,并遍历语句库中的每个句子,将遍历到的句子作为待识别句子;
将待识别句子中相邻的两个实体作为实体组,得到多个实体组;
利用关系识别模型识别每个实体组中的两个实体间的关系,得到多个M维关系向量;
针对每个M维关系向量,若任一M维关系向量中的最大数值大于第二阈值,则将最大数值在初始张量中对应位置的元素由0更新为1,以更新初始张量;
遍历语句库中的下一个句子,并继续更新当前张量,直至语句库中的每个句子均已被遍历,则输出并优化当前得到的张量,以得到N×N×M维张量。
在一种具体实施方式中,生成子单元具体用于:
针对任一实体组中的两个实体,将待识别句子中的这两个实体用不同标识符进行替换,将替换得到的新句子输入关系识别模型,以使关系识别模型输出这两个实体对应的M维关系向量。
在一些具体实施方式中,生成子单元针对每个M维关系向量,响应于任一M维关系向量中的最大数值大于预设阈值,将最大数值在初始张量中对应位置的元素由0更新为1,以更新初始张量。
在一种具体实施方式中,生成子单元具体用于:
将当前得到的张量作为初始三维矩阵,将初始三维矩阵分解为M个N×N维的矩阵X i;i=1,2,……,M;
将初始化得到的d×d×M维张量O分解为M个d×d维的矩阵O i;d为可调超参数;
初始化得到N×d维的矩阵A,并基于X i=AO iA T和梯度下降法求解得到最优A 和M个最优O i
基于最优A 和M个最优O i 得到新三维矩阵;
基于max函数逐位对比初始三维矩阵和新三维矩阵,并保留各位置的最大值,得到N×N×M维张量。
在一种具体实施方式中,扩充模块包括:
第二确定单元,用于将该实体作为目标对象,并在预设实体集中确定与目标对象具有最大关联的对象实体;对象实体为预设实体集中除该实体外的其他实体;
生成单元,用于基于与目标对象具有最大关联的对象实体生成目标对象的扩充信息。
在一种具体实施方式中,第二确定单元包括:
第一确定子单元,用于确定目标对象与每个对象实体的最大关系概率值,得到N-1个最大关系概率值;N-1为对象实体的个数,N为预设实体集包括的实体个数;
第二确定子单元,用于确定每个对象实体与目标语句的相关性,得到N-1个相关性;
计算子单元,用于针对每个对象实体,计算该对象实体对应的相关性和该对象实体对应的最 大关系概率值的乘积,得到该对象实体对应的关联得分,得到N-1个关联得分;
选择子单元,用于将N-1个关联得分中的最大关联得分对应的对象实体,作为与目标对象具有最大关联的对象实体。
在一种具体实施方式中,第二确定子单元具体用于:
针对每个对象实体,将目标语句中的每个实体与该对象实体的相关程度之和,确定为该对象实体与目标语句的相关性。
在一种具体实施方式中,目标语句中的任一个实体与任一个对象实体的相关程度为:目标语句中的任一个实体与该对象实体的最大关系概率值加上该对象实体与该实体的最大关系概率值。
在一种具体实施方式中,在BERT模型执行自然语言处理任务的过程中,将目标语句中的任一实体的扩充信息与该实体的注意力得分调整为目标值;目标值为该扩充信息与该实体的注意力得分与lnY之和;Y为N×N×M维张量中相应位置的数值;N×N×M维张量用于表示预设实体集中各实体间关系及关系概率值。
其中,关于本实施例中各个模块、单元更加具体的工作过程可以参考前述实施例中公开的相应内容,在此不再进行赘述。
可见,本实施例提供了一种自然语言处理装置,可屏蔽无关信息之间的相互影响,从而有效降低扩充信息对原输入数据的消极影响,使BERT模型提升自然语言处理任务的处理精度,提高BERT模型的处理效率和处理效果。
下面对本申请实施例提供的一种电子设备进行介绍,下文描述的一种电子设备与上文描述的一种自然语言处理方法及装置可以相互参照。
参见图5所示,本申请实施例公开了一种电子设备,包括:
存储器501,用于保存计算机可读指令;
处理器502,用于执行计算机可读指令,以实现上述任意实施例公开的方法。
本申请实施例提供了一个或多个存储有计算机可读指令的非易失性计算机可读存储介质进行介绍,下文描述的一个或多个存储有计算机可读指令的非易失性计算机可读存储介质与上文描述的一种自然语言处理方法、装置及设备可以相互参照。
一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行前述任一实施例中的自然语言处理方法的步骤。关于该方法的具体步骤可以参考前述实施例中公开的相应内容,在此不再进行赘述。
本申请涉及的“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法或设备固有的其它步骤或单元。
需要说明的是,在本申请中涉及“第一”、“第二”等的描述仅用于描述目的,而不能理解为指示或暗示其相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第 二”的特征可以明示或者隐含地包括至少一个该特征。另外,各个实施例之间的技术方案可以相互结合,但是必须是以本领域普通技术人员能够实现为基础,当技术方案的结合出现相互矛盾或无法实现时应当认为这种技术方案的结合不存在,也不在本申请要求的保护范围之内。
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似部分互相参见即可。
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的可读存储介质中。
本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (16)

  1. 一种自然语言处理方法,其特征在于,包括:
    获取待处理的目标语句,并确定所述目标语句中的各实体;
    针对所述目标语句中的每个实体,响应于该实体存在于预设实体集中,为该实体确定扩充信息,并将所确定的扩充信息添加至该实体在所述目标语句中所处位置之后,得到更新后的目标语句;以及
    将所述更新后的目标语句输入BERT模型,以使所述BERT模型执行自然语言处理任务;
    其中,在所述BERT模型执行自然语言处理任务的过程中,将所述目标语句中的任一实体的扩充信息与所述目标语句中的其他实体的注意力得分调整为零。
  2. 根据权利要求1所述的方法,其特征在于,所述为该实体确定扩充信息,包括:
    将该实体作为目标对象,并在所述预设实体集中确定与所述目标对象存在关系的实体组;以及
    在所述实体组中选择关系概率值大于第一阈值的实体,并基于所选择的实体生成所述目标对象的扩充信息。
  3. 根据权利要求2所述的方法,其特征在于,所述在所述预设实体集中确定与所述目标对象存在关系的实体组,包括:
    生成用于表示所述预设实体集中各实体间关系及关系概率值的N×N×M维张量;N为所述预设实体集包括的实体个数,M为所述预设实体集中不同实体间的关系个数;以及
    基于所述N×N×M维张量生成知识图谱,并在所述知识图谱中查询与所述目标对象存在关系的实体组。
  4. 根据权利要求3所述的方法,其特征在于,所述生成用于表示所述预设实体集中各实体间关系及关系概率值的N×N×M维张量,包括:
    生成由N×N×M维全0构成的初始张量;
    获取用于生成所述预设实体集的语句库,并遍历所述语句库中的每个句子,将遍历到的句子作为待识别句子;
    将所述待识别句子中相邻的两个实体作为实体组,得到多个实体组;
    利用关系识别模型识别每个实体组中的两个实体间的关系,得到多个M维关系向量;
    针对每个M维关系向量,响应于任一M维关系向量中的最大数值大于第二阈值,将所述最大数值在所述初始张量中对应位置的元素由0更新为1,以更新所述初始张量;以及
    遍历所述语句库中的下一个句子,并继续更新当前张量,直至所述语句库中的每个句子均已被遍历,则输出并优化当前得到的张量,以得到所述N×N×M维张量。
  5. 根据权利要求4所述的方法,其特征在于,所述利用关系识别模型识别每个实体组中的两个实体间的关系,得到多个M维关系向量,包括:
    针对任一实体组中的两个实体,将所述待识别句子中的这两个实体用不同标识符进行替换,将替换得到的新句子输入所述关系识别模型,以使所述关系识别模型输出这两个实体对应的M维 关系向量。
  6. 根据权利要求4或5所述的方法,其特征在于,所述优化当前得到的张量,以得到所述N×N×M维张量,包括:
    将当前得到的张量作为初始三维矩阵,将所述初始三维矩阵分解为M个N×N维的矩阵Xi;i=1,2,……,M;
    将初始化得到的d×d×M维张量O分解为M个d×d维的矩阵Oi;d为可调超参数;
    初始化得到N×d维的矩阵A,并基于Xi=AOiAT和梯度下降法求解得到最优A’和M个最优Oi’;
    基于所述最优A’和所述M个最优Oi’得到新三维矩阵;以及
    基于max函数逐位对比所述初始三维矩阵和所述新三维矩阵,并保留各位置的最大值,得到所述N×N×M维张量。
  7. 根据权利要求5所述的方法,其特征在于,所述关系识别模型包括转换器结构的子模型和关系分类神经网络;
    所述将替换得到的新句子输入所述关系识别模型,以使所述关系识别模型输出这两个实体对应的M维关系向量,包括:
    将所述新句子输入所述转换器结构的子模型,获得带有所述两个实体的标识符的特征向量;以及
    将带有所述两个实体的标识符的特征向量输入所述关系分类神经网络,获得所述两个实体对应的M维关系向量。
  8. 根据权利要求1所述的方法,其特征在于,所述为该实体确定扩充信息,包括:
    将该实体作为目标对象,并在所述预设实体集中确定与所述目标对象具有最大关联的对象实体;所述对象实体为所述预设实体集中除该实体外的其他实体;以及
    基于与所述目标对象具有最大关联的对象实体生成所述目标对象的扩充信息。
  9. 根据权利要求8所述的方法,其特征在于,所述在所述预设实体集中确定与所述目标对象具有最大关联的对象实体,包括:
    确定所述目标对象与每个对象实体的最大关系概率值,得到N-1个最大关系概率值;N-1为对象实体的个数,N为所述预设实体集包括的实体个数;
    确定每个对象实体与所述目标语句的相关性,得到N-1个相关性;
    针对每个对象实体,计算该对象实体对应的相关性和该对象实体对应的最大关系概率值的乘积,得到该对象实体对应的关联得分,得到N-1个关联得分;以及
    将所述N-1个关联得分中的最大关联得分对应的对象实体,作为与所述目标对象具有最大关联的对象实体。
  10. 根据权利要求9所述的方法,其特征在于,所述确定每个对象实体与所述目标语句的相关性,包括:
    针对每个对象实体,将所述目标语句中的每个实体与该对象实体的相关程度之和,确定为该对象实体与所述目标语句的相关性。
  11. 根据权利要求10所述的方法,其特征在于,所述目标语句中的任一个实体与任一个对象实体的相关程度为:所述目标语句中的任一个实体与该对象实体的最大关系概率值加上该对象实体与该实体的最大关系概率值。
  12. 根据权利要求1至11任一项所述的方法,其特征在于,还包括:
    在所述BERT模型执行自然语言处理任务的过程中,将所述目标语句中的任一实体的扩充信息与该实体的注意力得分调整为目标值;所述目标值为该扩充信息与该实体的注意力得分与lnY之和;Y为N×N×M维张量中相应位置的数值;所述N×N×M维张量用于表示所述预设实体集中各实体间关系及关系概率值。
  13. 根据权利要求1至12任一项所述的方法,其特征在于,所述目标语句中的任一实体的扩充信息与所述其他实体的注意力得分包括:该实体的扩充信息中的每个字与所述其他实体中的每个字之间的注意力得分。
  14. 一种自然语言处理装置,其特征在于,包括:
    获取模块,用于获取待处理的目标语句,并确定所述目标语句中的各实体;
    扩充模块,用于针对所述目标语句中的每个实体,若该实体存在于预设实体集中,则为该实体确定扩充信息,并将所确定的扩充信息添加至该实体在所述目标语句中所处位置之后,得到更新后的目标语句;
    处理模块,用于将所述更新后的目标语句输入BERT模型,以使所述BERT模型执行自然语言处理任务;其中,在所述BERT模型执行自然语言处理任务的过程中,将所述目标语句中的任一实体的扩充信息与所述目标语句中的其他实体的注意力得分调整为零。
  15. 一种电子设备,其特征在于,包括:
    存储器,用于存储计算机可读指令;以及
    处理器,用于执行所述计算机可读指令,以实现如权利要求1至13任一项所述的方法。
  16. 一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如权利要求1-13任一项所述方法的步骤。
PCT/CN2022/102862 2022-01-05 2022-06-30 一种自然语言处理方法、装置、设备及可读存储介质 WO2023130688A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210002872.6 2022-01-05
CN202210002872.6A CN114021572B (zh) 2022-01-05 2022-01-05 一种自然语言处理方法、装置、设备及可读存储介质

Publications (1)

Publication Number Publication Date
WO2023130688A1 true WO2023130688A1 (zh) 2023-07-13

Family

ID=80069308

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/102862 WO2023130688A1 (zh) 2022-01-05 2022-06-30 一种自然语言处理方法、装置、设备及可读存储介质

Country Status (2)

Country Link
CN (1) CN114021572B (zh)
WO (1) WO2023130688A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114021572B (zh) * 2022-01-05 2022-03-22 苏州浪潮智能科技有限公司 一种自然语言处理方法、装置、设备及可读存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6374209B1 (en) * 1998-03-19 2002-04-16 Sharp Kabushiki Kaisha Text structure analyzing apparatus, abstracting apparatus, and program recording medium
US20180336466A1 (en) * 2017-05-17 2018-11-22 Samsung Electronics Co., Ltd. Sensor transformation attention network (stan) model
CN109522553A (zh) * 2018-11-09 2019-03-26 龙马智芯(珠海横琴)科技有限公司 命名实体的识别方法及装置
KR20210040319A (ko) * 2020-04-23 2021-04-13 베이징 바이두 넷컴 사이언스 앤 테크놀로지 코., 엘티디. 엔티티 링킹 방법, 장치, 기기, 저장 매체 및 컴퓨터 프로그램
CN113158653A (zh) * 2021-04-25 2021-07-23 北京智源人工智能研究院 预训练语言模型的训练方法、应用方法、装置及设备
CN113627192A (zh) * 2021-07-29 2021-11-09 浪潮云信息技术股份公司 基于两层卷积神经网络的关系抽取方法和装置
CN114021572A (zh) * 2022-01-05 2022-02-08 苏州浪潮智能科技有限公司 一种自然语言处理方法、装置、设备及可读存储介质

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111274794B (zh) * 2020-01-19 2022-03-18 浙江大学 一种基于传递的同义词扩展方法
CN111563144B (zh) * 2020-02-25 2023-10-20 升智信息科技(南京)有限公司 基于语句前后关系预测的用户意图识别方法及装置
CN113779185B (zh) * 2020-06-10 2023-12-29 武汉Tcl集团工业研究院有限公司 一种自然语言模型的生成方法和计算机设备
CN111813954B (zh) * 2020-06-28 2022-11-04 北京邮电大学 文本语句中两实体的关系确定方法、装置和电子设备
CN112507715B (zh) * 2020-11-30 2024-01-16 北京百度网讯科技有限公司 确定实体之间关联关系的方法、装置、设备和存储介质
CN112270196B (zh) * 2020-12-14 2022-04-29 完美世界(北京)软件科技发展有限公司 实体关系的识别方法、装置及电子设备
CN112989024B (zh) * 2021-03-29 2023-04-07 腾讯科技(深圳)有限公司 文本内容的关系提取方法、装置、设备及存储介质
CN113434699B (zh) * 2021-06-30 2023-07-18 平安科技(深圳)有限公司 用于文本匹配的bert模型的预训练方法、计算机装置和存储介质
CN113705237A (zh) * 2021-08-02 2021-11-26 清华大学 融合关系短语知识的关系抽取方法、装置和电子设备

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6374209B1 (en) * 1998-03-19 2002-04-16 Sharp Kabushiki Kaisha Text structure analyzing apparatus, abstracting apparatus, and program recording medium
US20180336466A1 (en) * 2017-05-17 2018-11-22 Samsung Electronics Co., Ltd. Sensor transformation attention network (stan) model
CN109522553A (zh) * 2018-11-09 2019-03-26 龙马智芯(珠海横琴)科技有限公司 命名实体的识别方法及装置
KR20210040319A (ko) * 2020-04-23 2021-04-13 베이징 바이두 넷컴 사이언스 앤 테크놀로지 코., 엘티디. 엔티티 링킹 방법, 장치, 기기, 저장 매체 및 컴퓨터 프로그램
CN113158653A (zh) * 2021-04-25 2021-07-23 北京智源人工智能研究院 预训练语言模型的训练方法、应用方法、装置及设备
CN113627192A (zh) * 2021-07-29 2021-11-09 浪潮云信息技术股份公司 基于两层卷积神经网络的关系抽取方法和装置
CN114021572A (zh) * 2022-01-05 2022-02-08 苏州浪潮智能科技有限公司 一种自然语言处理方法、装置、设备及可读存储介质

Also Published As

Publication number Publication date
CN114021572A (zh) 2022-02-08
CN114021572B (zh) 2022-03-22

Similar Documents

Publication Publication Date Title
CN111310438B (zh) 基于多粒度融合模型的中文句子语义智能匹配方法及装置
CN109918666B (zh) 一种基于神经网络的中文标点符号添加方法
WO2021135910A1 (zh) 基于机器阅读理解的信息抽取方法、及其相关设备
WO2023065544A1 (zh) 意图分类方法、装置、电子设备及计算机可读存储介质
CN112507065B (zh) 一种基于注释语义信息的代码搜索方法
TWI662425B (zh) 一種自動生成語義相近句子樣本的方法
CN104199965B (zh) 一种语义信息检索方法
CN111241294A (zh) 基于依赖解析和关键词的图卷积网络的关系抽取方法
WO2023130687A1 (zh) 一种自然语言处理方法、装置、设备及可读存储介质
WO2021212801A1 (zh) 面向电商产品的评价对象识别方法、装置及存储介质
CN113255320A (zh) 基于句法树和图注意力机制的实体关系抽取方法及装置
CN110765755A (zh) 一种基于双重选择门的语义相似度特征提取方法
CN113360582B (zh) 基于bert模型融合多元实体信息的关系分类方法及系统
CN112632250A (zh) 一种多文档场景下问答方法及系统
CN115759092A (zh) 一种基于albert的网络威胁情报命名实体识别方法
CN111222330A (zh) 一种中文事件的检测方法和系统
JPWO2014002774A1 (ja) 同義語抽出システム、方法および記録媒体
CN112883199A (zh) 一种基于深度语义邻居和多元实体关联的协同消歧方法
WO2023130688A1 (zh) 一种自然语言处理方法、装置、设备及可读存储介质
CN115309915A (zh) 知识图谱构建方法、装置、设备和存储介质
Balaji et al. Text Summarization using NLP Technique
CN112417170B (zh) 面向不完备知识图谱的关系链接方法
CN110377753B (zh) 基于关系触发词与gru模型的关系抽取方法及装置
CN117407532A (zh) 一种利用大模型与协同训练进行数据增强的方法
Zhao et al. ECNU: using traditional similarity measurements and word embedding for semantic textual similarity estimation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22918144

Country of ref document: EP

Kind code of ref document: A1