WO2023130687A1 - 一种自然语言处理方法、装置、设备及可读存储介质 - Google Patents

一种自然语言处理方法、装置、设备及可读存储介质 Download PDF

Info

Publication number
WO2023130687A1
WO2023130687A1 PCT/CN2022/102856 CN2022102856W WO2023130687A1 WO 2023130687 A1 WO2023130687 A1 WO 2023130687A1 CN 2022102856 W CN2022102856 W CN 2022102856W WO 2023130687 A1 WO2023130687 A1 WO 2023130687A1
Authority
WO
WIPO (PCT)
Prior art keywords
entity
relationship
sentence
dimensional
entities
Prior art date
Application number
PCT/CN2022/102856
Other languages
English (en)
French (fr)
Inventor
郭振华
王立
赵雅倩
李仁刚
范宝余
邓祥一
Original Assignee
苏州浪潮智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州浪潮智能科技有限公司 filed Critical 苏州浪潮智能科技有限公司
Publication of WO2023130687A1 publication Critical patent/WO2023130687A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Definitions

  • the present application relates to the technical field of computer machine learning, and in particular to a natural language processing method, device, equipment and readable storage medium.
  • natural language processing tasks are generally performed through the BERT (Bidirectional Encoder Representations from Transformer, bidirectional encoder representation based on converter) model.
  • BERT Bidirectional Encoder Representations from Transformer, bidirectional encoder representation based on converter
  • additional knowledge can be obtained from the knowledge map, and then added to the model input data to assist the BERT model in natural language processing tasks.
  • Knowledge graph is the best source of knowledge extraction because of the structured information stored.
  • the existing knowledge map only pays attention to the superficial relationship of the text, and ignores the potential relationship that may exist.
  • Using the knowledge map to add information to the input of the BERT model is easy to introduce too much noise (low correlation information), affecting efficiency and effectiveness.
  • the input of the BERT model determines the accuracy of its output results, when the information input into the BERT model is too noisy, it not only increases the amount of input data, but also may affect the accuracy of the BERT model, reducing the processing efficiency and effect of the BERT model, resulting in The model may not output the corresponding results accurately. For example: in the question answering task, the BERT model cannot accurately answer the question.
  • the embodiment of the present application provides a natural language processing method, including:
  • the second entity For each first entity, in response to the presence of the first entity in the preset entity set, determine a second entity having the greatest association with the first entity in the preset entity set, and generate extended information based on the determined second entity After adding the extended information to the position of the first entity in the target sentence, an updated target sentence is obtained; the second entity is any entity in the preset entity set except the first entity; and
  • determining the second entity having the greatest association with the first entity in the preset entity set includes:
  • N-1 is the number of second entities, and N is the preset The total number of entities included in the entity set;
  • the second entity corresponding to the largest association score among the N-1 association scores is used as the second entity having the largest association with the target object.
  • determining the maximum relationship probability value between the target object and each second entity includes:
  • N ⁇ N ⁇ M dimensional tensor used to represent the relationship between entities in the preset entity set and the relationship probability value
  • M is the dimension of the relationship vector between different entities in the preset entity set
  • a knowledge graph is generated based on an N ⁇ N ⁇ M dimensional tensor, and the maximum relationship probability value between the target object and each second entity is queried in the knowledge graph.
  • generating an N ⁇ N ⁇ M dimensional tensor used to represent the relationship between entities in the preset entity set and the relationship probability value includes:
  • Two adjacent entities in the sentence to be recognized are used as entity groups to obtain multiple entity groups;
  • relationship recognition model to identify the relationship between two entities in each entity group to obtain a plurality of M-dimensional relationship vectors
  • a relationship between two entities in each entity group is identified using a relationship recognition model to obtain a plurality of M-dimensional relationship vectors, including:
  • the relationship recognition model includes a sub-model of the converter structure and a relationship classification neural network; the new sentence obtained by replacing is input into the relationship recognition model, so that the relationship recognition model outputs the M-dimensional relationship vector corresponding to the two entities, include:
  • optimizing the currently obtained tensor to obtain an N ⁇ N ⁇ M dimensional tensor includes:
  • the initial three-dimensional matrix and the new three-dimensional matrix are compared bit by bit, and the maximum value of each position is retained to obtain an N ⁇ N ⁇ M dimensional tensor.
  • determining the relevance of each second entity to the target sentence comprises:
  • the normalized result of the sum of the correlation degrees between each first entity in the target sentence and the second entity is determined as the correlation between the second entity and the target sentence.
  • the degree of correlation between any first entity and any second entity is: the maximum relationship probability value between the first entity and the second entity plus the maximum relationship between the second entity and the first entity probability value.
  • determining each first entity in the target sentence includes:
  • the vector set is input to the entity recognition model, so that the entity recognition model recognizes each first entity in the target sentence.
  • the embodiment of the present application provides a natural language processing device, including:
  • An acquisition module configured to acquire the target sentence to be processed, and determine each first entity in the target sentence
  • the expansion module is configured to, for each first entity, in response to the presence of the first entity in the preset entity set, determine in the preset entity set the second entity that has the greatest association with the first entity, and based on the determined first entity
  • the second entity generates extended information, and after adding the extended information to the position of the first entity in the target sentence, an updated target sentence is obtained; the second entity is any entity in the preset entity set except the first entity; as well as
  • the processing module is used for inputting the updated target sentence into the BERT model, so that the BERT model performs a natural language processing task.
  • an electronic device including:
  • a processor configured to execute computer-readable instructions to implement the natural language processing method in any of the foregoing embodiments.
  • the present application provides one or more non-volatile computer-readable storage media storing computer-readable instructions, which, when executed by one or more processors, cause one or more processing
  • the device executes the steps of the natural language processing method in any of the foregoing embodiments.
  • Fig. 1 is a flow chart of a natural language processing method according to one or more embodiments
  • Fig. 2 is a schematic diagram of a relationship recognition model according to one or more embodiments
  • Fig. 3 is a schematic diagram of an entity recognition model according to one or more embodiments.
  • Fig. 4 is a schematic diagram of a natural language processing device according to one or more embodiments.
  • Fig. 5 is a schematic diagram of an electronic device according to one or more embodiments.
  • the embodiment of the present application provides a natural language processing solution, which can expand effective information for the input data of the BERT model, so that the BERT model can improve the processing accuracy of the natural language processing task.
  • step S101 the embodiment of the present application discloses a natural language processing method, including step S101, step 102 and step S103.
  • Step S101 Obtain a target sentence to be processed, and determine each first entity in the target sentence.
  • the noun in the target sentence is referred to as the first entity, and the target sentence includes at least one first entity.
  • the sentence "Company A managed by Xiao Ming operates a clothing business" includes three first entities: Xiao Ming, company A, and clothing business.
  • Step S102 for each first entity, if the first entity exists in the preset entity set, determine the second entity that has the greatest association with the first entity in the preset entity set, and based on the determined second entity
  • the extended information is generated, and after the extended information is added to the position of the first entity in the target sentence, an updated target sentence is obtained.
  • the second entity is any entity in the preset entity set except the first entity.
  • step S102 may include: for each first entity, in response to the presence of the first entity in the preset entity set, determining the second entity that has the greatest association with the first entity in the preset entity set, and based on the The determined second entity generates extended information, and the extended information is added to the position of the first entity in the target sentence to obtain an updated target sentence.
  • nouns in the preset entity set are referred to as second entities, and the preset entity set is a collection of many second entities. If the first entity in the target sentence exists in the preset entity set, it means that the second entity associated with the first entity can be found from the preset entity set, and then it is determined in the preset entity set that it has the largest A second entity associated, and generate extended information based on the determined second entity, and then add the extended information to the position of the first entity in the target sentence, so as to add valid extended information for the first entity . Accordingly, valid extended information can be added to each first entity in the target sentence that exists in the preset entity set, so that an updated target sentence can be obtained. Wherein, no extended information is added to the first entity in the target sentence that does not exist in the preset entity set.
  • the target sentence is: "Xiao Ming founded Company A”
  • the extended information added to the first entity “Xiao Ming” is: “is a Chinese”, where “Chinese” is the second entity in the default entity set
  • the extended information added by the first entity “Company A” is: “operating clothing business”, in which "clothing business” is the second entity in the preset entity set
  • the final updated target sentence is: "Xiao Ming It was the Chinese who founded Company A to operate the clothing business.”
  • “is” is: a relational vocabulary connecting "Xiao Ming” and “Chinese people”
  • "operating” is: a relational vocabulary connecting "A company” and "clothing business”.
  • Such relational vocabulary can be determined based on language idioms.
  • Step S103 inputting the updated target sentence into the BERT model, so that the BERT model can perform natural language processing tasks.
  • the updated target sentence is used as the input data of the BERT model, so that the BERT model can obtain as much information as possible when performing natural language processing tasks, thereby improving the processing accuracy and effect.
  • the BERT model handles the question answering task
  • it can add extended information to each entity in the question sentence according to this embodiment to update the question sentence.
  • the new question sentence is updated, the new question sentence is processed to determine the most optimal answer for the new question sentence.
  • this embodiment can add extended information to each first entity in the target sentence, and the added extended information is generated based on the second entity most associated with the corresponding first entity, so the added extended information is consistent with each of the target sentences.
  • the first entity has a high degree of correlation, which can expand effective information for the target sentence as the input data of the BERT model, so that the BERT model can improve the processing accuracy of natural language processing tasks, and can improve the processing efficiency and processing effect of the BERT model.
  • determining the second entity that has the greatest association with the first entity in the preset entity set includes: taking the first entity as the target object, and determining the maximum relationship between the target object and each second entity relationship probability value, to obtain N-1 maximum relationship probability values; N-1 is the number of second entities, and N is the total number of entities included in the preset entity set; determine the correlation between each second entity and the target sentence, and obtain N-1 correlations; for each second entity, calculate the product of the correlation corresponding to the second entity and the maximum relationship probability value corresponding to the second entity, obtain the correlation score corresponding to the second entity, and obtain N- 1 association score; the second entity corresponding to the largest association score among the N-1 association scores is taken as the second entity having the greatest association with the target object.
  • the preset entity set includes N entities, and N is a positive integer, then if any first entity W exists in the preset entity set, the first entity W and the remaining N-1 second entities can A maximum relationship probability value is determined, so as to obtain N-1 maximum relationship probability values.
  • the relationship between a first entity and a second entity is represented by an M-dimensional vector
  • M relationship probability values there are M relationship probability values in the M-dimensional vector, and in this embodiment, the maximum value is determined as the value between the first entity and the second entity.
  • the maximum relationship probability value between two entities is determined as the value between the first entity and the second entity.
  • all N ⁇ 1 second entities except the first entity W in the preset entity set can obtain a correlation with the target sentence, thereby obtaining N ⁇ 1 correlations.
  • any second entity it corresponds to a correlation and a maximum relationship probability value, so a correlation score can be obtained by calculating the product of the correlation and the maximum relationship probability value.
  • N-1 association scores are available.
  • the second entity corresponding to the largest association score among the N-1 association scores is: the entity with the largest relationship with the first entity W and the largest correlation with the target sentence, so it is regarded as the entity with the largest association score with the first entity W.
  • the associated second entity The extended information generated by the second entity based on the maximum association of the first entity W can be considered as effective and accurate extended information of the first entity W, so the accuracy and accuracy of information extension can be improved, and invalid and inaccurate information can be avoided.
  • determining the correlation between each second entity and the target sentence includes: for each second entity, calculating the sum of the correlation degrees between each first entity in the target sentence and the second entity The normalized result of is determined as the correlation between the second entity and the target sentence.
  • the degree of correlation between any first entity and any second entity is: the maximum relationship probability value between the first entity and the second entity plus the maximum relationship probability value between the second entity and the first entity.
  • the correlation between the target sentence and the second entity F is: the degree of correlation between A and F, the degree of correlation between B and F, and the degree of correlation between C and F
  • the degree of correlation degrees, the obtained sum needs to be normalized.
  • the degree of correlation between A and F is: the maximum relationship probability value between A and F plus the maximum relationship probability value between F and A.
  • the degree of correlation between B and F is: the maximum relationship probability value between B and F plus the maximum relationship probability value between F and B.
  • the degree of correlation between C and F is: the maximum relationship probability value between C and F plus the maximum relationship probability value between F and C. It can be seen that to calculate the degree of correlation between any two entities, it is necessary to find the two maximum relationship probability values related to the two entities in the tensor, and then sum and normalize the two maximum relationship probability values.
  • each dimension in the M-dimensional relationship vector is between 0-1, and each value represents the probability of having a corresponding relationship, and the sum of the values of all dimensions is 1.
  • determining the maximum relationship probability value between the target object and each second entity includes: generating an N ⁇ N ⁇ M dimensional tensor used to represent the relationship and relationship probability value between each entity in the preset entity set ; M is the dimension of the relationship vector between different entities in the preset entity set; generate a knowledge graph based on N ⁇ N ⁇ M dimensional tensors, and query the maximum relationship probability value between the target object and each second entity in the knowledge graph.
  • the N ⁇ N ⁇ M dimensional tensor is equivalent to the knowledge map, so generating the knowledge map based on the N ⁇ N ⁇ M dimensional tensor is only to express the relevant information in the tensor in the form of the knowledge map.
  • the knowledge map is generated based on N ⁇ N ⁇ M dimensional tensors, so the generation process of N ⁇ N ⁇ M dimensional tensors includes: generating an initial tensor composed of N ⁇ N ⁇ M dimensional all 0s; The sentence library of the entity set, and traverse each sentence in the sentence library, and use the traversed sentence as the sentence to be recognized; use the two adjacent entities in the sentence to be recognized as an entity group to obtain multiple entity groups; use relationship recognition The model identifies the relationship between two entities in each entity group, and obtains multiple M-dimensional relationship vectors; for each M-dimensional relationship vector, if the maximum value in any M-dimensional relationship vector is greater than the preset threshold, the maximum The element corresponding to the value in the initial tensor is updated from 0 to 1 to update the initial tensor; traverse the next sentence in the sentence library, and continue to update the current tensor until every sentence in
  • the element corresponding to the maximum value in the initial tensor is updated from 0 to 1 to update the initial tensor, including : For each M-dimensional relationship vector, in response to the maximum value in any M-dimensional relationship vector being greater than a preset threshold, update the element corresponding to the maximum value in the initial tensor from 0 to 1 to update the initial tensor.
  • the relationship between two entities in each entity group is identified by using a relationship recognition model to obtain a plurality of M-dimensional relationship vectors, including: for two entities in any entity group, the to-be-identified The two entities in the sentence are replaced with different identifiers, and the new sentence obtained by the replacement is input into the relation recognition model, so that the relation recognition model outputs the M-dimensional relation vectors corresponding to the two entities.
  • the relation recognition model can refer to Fig. 2, includes a sub-model of Transformer (converter) structure and a relation classification neural network;
  • the sub-model of Transformer structure includes 12 multi-head attention modules.
  • the relationship recognition model can also be of other structures.
  • Any new sentence is first input into the sub-model of the Transformer structure to obtain two feature vectors: m ⁇ and m ⁇ , and then m ⁇ and m ⁇ are input into the relationship classification neural network to obtain M relationships between ⁇ and ⁇ and M relationship probability values , the M relationships and M relationship probability values are represented by M-dimensional relationship vectors, and each element in the vector is a relationship probability value.
  • ⁇ and ⁇ are: replace the different identifiers of two adjacent entities, so the M relationships and M relationship probability values between ⁇ and ⁇ are: the M relationships between the two adjacent entities that are replaced relationship and M relationship probability values.
  • the M-dimensional relationship vector obtained for the new sentence 1 represents the relationship between "Xiao Ming" and "Company A”. If it is determined based on the sentence "Company A has Xiaoming”, the clause: ⁇ has ⁇ . Then the M-dimensional relationship vector obtained for the clause " ⁇ has ⁇ " indicates the relationship between "Company A” and "Xiaoming”. It can be seen that after the positions of two adjacent entities are exchanged, it is necessary to use the relationship recognition model to perform relationship recognition again.
  • the relationship between two entities is 0 in the tensor
  • the present application also optimizes the currently obtained tensor composed of 0 and 1.
  • X M AO M A T .
  • the optimal A' and the M optimal O i 's can be obtained based on the gradient descent method, that is, the optimal A' and the optimal O i ' in the above M equations are respectively calculated.
  • calculate X' 1 A'O 1 A' T
  • X' 2 A'O 2 A' T
  • X' M A'O M A' T
  • M calculation results: X' 1 ... X' i ... X' M .
  • a new three-dimensional matrix can be obtained by splicing M calculation results.
  • A' T is the transpose matrix of A'.
  • M optimal O i ' O' 1 ...
  • each element in the initial three-dimensional matrix and the new three-dimensional matrix is compared bit by bit, and the maximum value of each position is retained, so as to obtain an N ⁇ N ⁇ M dimensional tensor.
  • the element at a certain position in the initial three-dimensional matrix is 0, and the element in the new three-dimensional matrix is 0.02, then the element at this position of X" is recorded as 0.02.
  • the element at a certain position in the initial three-dimensional matrix is 1, and the element in the new three-dimensional matrix is 0.99, then the element at this position of X" is recorded as 1.
  • determining each first entity in the target sentence includes: converting each word in the target sentence into a 1024-dimensional vector to obtain a vector set; inputting the vector set into an entity recognition model to enable entity recognition The model identifies each first entity in the target sentence.
  • the entity recognition model can be referred to in FIG. 3 .
  • the entity recognition model shown in Figure 3 is implemented based on Transformer, which includes 6 multi-head self-attention modules. Of course, the entity recognition model can also be implemented based on other structures. "Position coding" among Fig. 2 and Fig. 3 is used for recording: the position of each word in the sentence in the sentence.
  • the recognition result output by the entity recognition model may be: Beijing is the capital of China (AABAABBB).
  • A means entity and B means non-entity.
  • the following embodiments follow the steps of data preparation, entity recognition, relationship recognition, knowledge map building, information value evaluation, and information embedding, and add the professional knowledge in the knowledge map to the input data of the BERT model, so that the BERT model can improve natural language The precision with which the task is processed.
  • the word list W also contains two special symbols ⁇ and ⁇ .
  • the word list W can be used to determine the relational vocabulary between two entities.
  • the specific entity recognition model can be shown in FIG. 3 . Sentences are randomly selected from the sentence library for manual labeling to obtain training text, and the model is trained with the labeled text to obtain the entity recognition model.
  • the specific training process can refer to existing related technologies, and will not be repeated here.
  • each word contained in the word table W into a Z-dimensional one-hot vector, and then map the Z-dimensional one-hot vector to a 1024-dimensional vector through the matrix J ⁇ R 1024 ⁇ Z .
  • R is a set of preset real numbers. In this way, each word can be represented by a 1024-dimensional vector.
  • each word in the sentence is represented by a 1024-dimensional vector, then a set of 1024-dimensional vectors can be obtained. Input this set into the model of entity recognition, each entity in the sentence can be recognized.
  • the entities in each sentence in the sentence base can be identified by using the entity recognition model. Then, for each sentence, the relationship between the entities in it can be further identified.
  • Relational table G containing all the relations we are interested in. There are M different relations in the relational table G, one of which is defined as no relation, which means that there is no relation between two entities.
  • the relationship table G is customized according to requirements, and each relationship corresponds to an M-dimensional one-hot vector. It is stipulated that the k-th relationship in the M-dimensional one-hot vector is denoted as r k , corresponding to the k-th dimension of the vector.
  • a method for replacing entities in a sentence includes: for a sentence, after determining all the entities in the sentence, sequentially replace two adjacent entities in the sentence.
  • first traverse the sentence word by word replace all words corresponding to the first entity in the sentence with ⁇ , replace all words corresponding to the second entity in the sentence with ⁇ , and obtain a clause (that is, a new sentence ); start from the second entity in the sentence, replace all the words corresponding to the second entity in the sentence with ⁇ , replace all the words corresponding to the third entity with ⁇ , and get the second clause...and so on until the sentence All entities in are replaced.
  • the hypothetical sentence is: Company A managed by Xiao Ming operates a clothing business.
  • the vector h is then input to the relationship classification neural network for relationship classification.
  • the output vector is (p 1 , p 2 , ..., p M )
  • the first dimension, the second dimension, ..., and the Mth dimension correspond to r 1 , r 2 , ..., r M
  • the probability of relationship r 1 between ⁇ and ⁇ in the input clause is p 1
  • the probability of relationship r 2 is p 2
  • the probability of relationship r M is p M .
  • the relationship between ⁇ and ⁇ is equivalent to the relationship between the two entities replaced in the original sentence.
  • Entity table E contains a total of N entities. Regulations: e i is the i-th entity in E, 1 ⁇ i ⁇ N.
  • the relationship recognition model uses the relationship recognition model to identify the relationship between two adjacent entities in each sentence and build a knowledge graph.
  • the knowledge map is expressed quantitatively by N ⁇ N ⁇ M dimensional tensor. From the tensor, it can be seen whether there is any relationship contained in the relationship table G between any two entities contained in the entity table E. N is the number of entities contained in the entity table E, and M is the number of relationships contained in the relational table G.
  • Use X ijk to represent an element at a position in the tensor, which corresponds to two entities in the entity table E and a relationship in the relationship table G.
  • the corresponding rules are: X ijk corresponds to the i-th entity e i and the j-th entity in the entity table E entity e j , and the kth relation r k in the relational table G.
  • the knowledge map construction process includes:
  • X is continuously updated accordingly until all M-dimensional vectors are processed, then the final X is output.
  • step (3) Optimize the final X obtained in step (2) based on tensor decomposition to supplement potential information for the knowledge graph.
  • the X obtained through step (2) is usually sparse.
  • tensor decomposition is used to infer the potential relationship in X, and the potential relationship is mined, thereby supplementing and perfecting the knowledge (relationship between entities) in the knowledge graph.
  • the value at the position of 0 in X is inferred by using the element of 1 in X, that is, the possibility of existence of uncertain relationship is inferred from the existing definite relationship.
  • the relationship: (Xiao Ming, management, Company A), (Xiao Ma, management, Company B), (Company A, operation, clothing business), (Company B, operation, WeChat), (Xiao Ming, management , clothing business) it can be deduced that: (Little Horse, Management, WeChat) has a higher probability of being established, and the knowledge map can be supplemented accordingly.
  • the specific implementation steps include: splitting the tensor X into M N ⁇ N-dimensional matrices X 1 , ..., X M , and constructing an N ⁇ d-dimensional matrix A and M d ⁇ d-dimensional matrices O 1 , ... , O M , to decompose X 1 , ..., X M into AO 1 A T , ..., AO M A T .
  • the optimal A, the optimal O 1 ,..., OM are obtained, and the new tensor O is obtained by splicing O 1 ,..., OM .
  • the updated tensor The value of each point on X represents the possibility that there is a relationship in the relationship table G corresponding to the point between the two entities in the entity table E corresponding to the point, or the credibility of the information that the relationship exists between the two entities
  • X ijk corresponds to the i-th entity e i and the j-th entity e j in the entity table E and the k-th relationship r k in the relational table G
  • X ijk 0.8 means that there exists between the entity e i and the entity e j
  • the probability of relation r k is 0.8.
  • the knowledge map (tensor X) constructed in the above steps represents the possibility of a certain relationship in the relationship table G between any two entities in the entity table E
  • the correlation degree score of the two entities is:
  • X jik corresponds to the j-th entity e j and the i-th entity e i in the entity table E, and the probability value of the k-th relationship r k in the relationship table G exists.
  • sentence T intersects entity table E with p entities: e t1 , ... e tp .
  • the correlation between the sentence T and an entity e i in the entity table E is measured by the normalized result of the sum of the correlation degrees between the entities e t1 , ... e tp and the entity e i , that is, the relationship between T and e i
  • the relevance score is:
  • N is the number of entities contained in the entity table E
  • p is the number of entities contained in both the entity table E and the sentence T.
  • the correlation score function Y and the knowledge map (tensor X) to select the information (relationship between entities) inserted into the sentence T.
  • the information represents the knowledge that there is a certain relationship between two entities, for example: "There is a relationship r k between entity e i and entity e j " is a piece of information.
  • the value of information for a certain sentence is positively related to the possibility of the existence of the information and the correlation between the information and the sentence.
  • supplementary information is selected for each entity.
  • the selection method of supplementary information is as follows: For an entity eq(q ⁇ t1, ... ... , tp ⁇ ), get the probability X qi1 , ...X qiM , for the M relationship probabilities between entity e q and entity e i (i ⁇ q), find the maximum value, and obtain the maximum relationship probability between e q and e i .
  • the N-1 information evaluation values can specifically be: V 1 to V N-1 are: the value of the corresponding entity and information for the sentence T.
  • the BERT model can achieve better processing results when processing sentences with information augmentation.
  • the value of the insertion information to the sentence is fully considered to ensure that the insertion information quality, reducing the introduction of noise.
  • this embodiment can identify the entities in the sentence based on the entity recognition model, and use the relationship recognition model to identify the relationship between two adjacent entities in the sentence, and construct and optimize the knowledge map accordingly, and obtain a Zhang that represents the relationship between any two entities.
  • Quantity X the degree of correlation between entities and the correlation between entities and sentences in the information to be inserted.
  • a natural language processing device provided in an embodiment of the present application is introduced below, and a natural language processing device described below and a natural language processing method described above may refer to each other.
  • the embodiment of the present application discloses a natural language processing device, including:
  • An acquisition module 401 configured to acquire a target sentence to be processed, and determine each first entity in the target sentence;
  • the expansion module 402 is configured to, for each first entity, if the first entity exists in the preset entity set, determine the second entity that has the greatest association with the first entity in the preset entity set, and based on the determined The second entity generates extended information, and after adding the extended information to the position of the first entity in the target sentence, an updated target sentence is obtained; the second entity is any entity in the preset entity set except the first entity ;
  • the processing module 403 is configured to input the updated target sentence into the BERT model, so that the BERT model can perform natural language processing tasks.
  • the expansion module 402 is specifically configured to: for each first entity, in response to the presence of the first entity in the preset entity set, determine the first entity that has the greatest association with the first entity in the preset entity set two entities, generate extended information based on the determined second entity, add the extended information to the position of the first entity in the target sentence, and obtain an updated target sentence.
  • the expansion module includes:
  • the first determination unit is configured to use the first entity as the target object, and determine the maximum relationship probability value between the target object and each second entity, to obtain N-1 maximum relationship probability values; N-1 is the second entity Number, N is the total number of entities included in the default entity set;
  • the second determination unit is configured to determine the correlation between each second entity and the target sentence to obtain N-1 correlations
  • a calculation unit for each second entity, calculating the product of the correlation corresponding to the second entity and the maximum relationship probability value corresponding to the second entity, obtaining the correlation score corresponding to the second entity, and obtaining N-1 association score;
  • the selection unit is configured to use the second entity corresponding to the largest association score among the N-1 association scores as the second entity having the greatest association with the target object.
  • the first determination unit includes:
  • a generation subunit is used to generate an N ⁇ N ⁇ M dimensional tensor used to represent the relationship between each entity in the preset entity set and the relationship probability value; M is the dimension of the relationship vector between different entities in the preset entity set;
  • the query subunit is configured to generate a knowledge graph based on an N ⁇ N ⁇ M dimensional tensor, and query the maximum relationship probability value between the target object and each second entity in the knowledge graph.
  • the generating subunit is specifically used for:
  • Two adjacent entities in the sentence to be recognized are used as entity groups to obtain multiple entity groups;
  • relationship recognition model to identify the relationship between two entities in each entity group to obtain a plurality of M-dimensional relationship vectors
  • the element corresponding to the maximum value in the initial tensor is updated from 0 to 1 to update the initial tensor;
  • the generation subunit updates the element corresponding to the maximum value in the initial tensor from 0 to 1 in response to the maximum value in any M-dimensional relationship vector being greater than a preset threshold , to update the initial tensor.
  • the generating subunit is specifically used for:
  • the generating subunit is specifically used for:
  • the initial three-dimensional matrix and the new three-dimensional matrix are compared bit by bit, and the maximum value of each position is retained to obtain an N ⁇ N ⁇ M dimensional tensor.
  • the second determination unit is specifically used for:
  • the normalized result of the sum of the correlation degrees between each first entity in the target sentence and the second entity is determined as the correlation between the second entity and the target sentence.
  • the degree of correlation between any first entity and any second entity is: the maximum relationship probability value between the first entity and the second entity and the maximum relationship probability value between the second entity and the first entity The normalized result of the sum of relation probability values.
  • the acquisition module is specifically used for:
  • the vector set is input to the entity recognition model, so that the entity recognition model recognizes each first entity in the target sentence.
  • this embodiment provides a natural language processing device, which can expand effective information for the target sentence as the input data of the BERT model, so that the BERT model can improve the processing accuracy of natural language processing tasks, and can improve the processing efficiency and processing effect of the BERT model .
  • An electronic device provided by an embodiment of the present application is introduced below, and the electronic device described below and the natural language processing method and apparatus described above may refer to each other.
  • an electronic device including:
  • memory 501 for storing computer readable instructions
  • the processor 502 is configured to execute computer-readable instructions to implement the natural language processing method disclosed in any of the foregoing embodiments.
  • the embodiment of the present application provides one or more non-volatile computer-readable storage media storing computer-readable instructions, and one or more non-volatile computer-readable storage media storing computer-readable instructions described below It can be cross-referenced with a natural language processing method, device and equipment described above.
  • One or more non-volatile computer-readable storage media storing computer-readable instructions, when the computer-readable instructions are executed by one or more processors, one or more processors are executed to perform any of the preceding embodiments. Steps of a natural language processing method. Regarding the specific steps of the method, reference may be made to the corresponding content disclosed in the foregoing embodiments, and details are not repeated here.
  • RAM random access memory
  • ROM read-only memory
  • EEPROM electrically programmable ROM
  • EEPROM electrically erasable programmable ROM
  • registers hard disk, removable disk, CD-ROM, or any other Any other known readable storage medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Animal Behavior & Ethology (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

本申请实施例公开了一种自然语言处理方法、装置、设备及可读存储介质。前述方法包括:获取待处理的目标语句,并确定目标语句中的各第一实体;针对每个第一实体,响应于该第一实体存在于预设实体集中,在预设实体集中确定与该第一实体具有最大关联的第二实体,并基于所确定的第二实体生成扩充信息,将扩充信息添加至该第一实体在目标语句中所处位置之后,得到更新后的目标语句;第二实体为预设实体集中除该第一实体外的任一实体;将更新后的目标语句输入BERT模型,以使BERT模型执行自然语言处理任务。

Description

一种自然语言处理方法、装置、设备及可读存储介质
相关申请的交叉引用
本申请要求于2022年1月5日提交中国专利局,申请号为CN202210003234.6,申请名称为“一种自然语言处理方法、装置、设备及可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机机器学习技术领域,特别涉及一种自然语言处理方法、装置、设备及可读存储介质。
背景技术
目前,一般通过BERT(Bidirectional Encoder Representations from Transformer,基于转换器的双向编码器表示)模型执行自然语言处理任务。为提升任务处理精度,可以从知识图谱中获取额外知识,然后将该额外知识加入到模型输入数据中,以此来辅助BERT模型进行自然语言处理任务。知识图谱因其存储的信息具有结构性,是提取知识的最佳来源。
但是,现有的知识图谱仅关注了文本的表面关系,而忽视了可能存在的潜在关系,且利用将知识图谱给BERT模型的输入添加信息,容易引入过多噪声(相关度低信息),影响效率和效果。由于BERT模型的输入决定了其输出结果的精度,因此输入到BERT模型中的信息噪声过多时,不仅增加了输入的数据量,还可能影响BERT模型精度,降低BERT模型处理效率和处理效果,导致模型可能无法准确输出相应结果。例如:在问答任务中,BERT模型无法准确地回答问题。
发明内容
在第一方面,本申请实施例提供了一种自然语言处理方法,包括:
获取待处理的目标语句,并确定目标语句中的各第一实体;
针对每个第一实体,响应于该第一实体存在于预设实体集中,在预设实体集中确定与该第一实体具有最大关联的第二实体,并基于所确定的第二实体生成扩充信息,将扩充信息添加至该第一实体在目标语句中所处位置之后,得到更新后的目标语句;第二实体为预设实体集中除该第一实体外的任一实体;以及
将更新后的目标语句输入BERT模型,以使BERT模型执行自然语言处理任务。
在一些实施例中,在预设实体集中确定与该第一实体具有最大关联的第二实体,包括:
将该第一实体作为目标对象,并确定目标对象与每个第二实体的最大关系概率值,得到N-1个最 大关系概率值;N-1为第二实体的个数,N为预设实体集包括的实体总数;
确定每个第二实体与目标语句的相关性,得到N-1个相关性;
针对每个第二实体,计算该第二实体对应的相关性和该第二实体对应的最大关系概率值的乘积,得到该第二实体对应的关联得分,得到N-1个关联得分;以及
将N-1个关联得分中的最大关联得分对应的第二实体,作为与目标对象具有最大关联的第二实体。
在一些实施例中,确定目标对象与每个第二实体的最大关系概率值,包括:
生成用于表示预设实体集中各个实体间关系及关系概率值的N×N×M维张量;M为预设实体集中不同实体间的关系向量的维数;以及
基于N×N×M维张量生成知识图谱,并在知识图谱中查询目标对象与每个第二实体的最大关系概率值。
在一些实施例中,生成用于表示预设实体集中各个实体间关系及关系概率值的N×N×M维张量,包括:
生成由N×N×M维全0构成的初始张量;
获取用于生成预设实体集的语句库,并遍历语句库中的每个句子,将遍历到的句子作为待识别句子;
将待识别句子中相邻的两个实体作为实体组,得到多个实体组;
利用关系识别模型识别每个实体组中的两个实体间的关系,得到多个M维关系向量;
针对每个M维关系向量,响应于任一M维关系向量中的最大数值大于预设阈值,将最大数值在初始张量中对应位置的元素由0更新为1,以更新初始张量;以及
遍历语句库中的下一个句子,并继续更新当前张量,直至语句库中的每个句子均已被遍历,则输出并优化当前得到的张量,以得到N×N×M维张量。
在一些实施例中,利用关系识别模型识别每个实体组中的两个实体间的关系,得到多个M维关系向量,包括:
针对任一实体组中的两个实体,将待识别句子中的这两个实体用不同标识符进行替换,将替换得到的新句子输入关系识别模型,以使关系识别模型输出这两个实体对应的M维关系向量。
在一些实施例中,关系识别模型包括转换器结构的子模型和关系分类神经网络;将替换得到的新句子输入关系识别模型,以使关系识别模型输出这两个实体对应的M维关系向量,包括:
将新句子输入转换器结构的子模型,获得带有两个实体的标识符的特征向量;以及
将带有两个实体的标识符的特征向量输入关系分类神经网络,获得两个实体对应的M维关系向量。
在一些实施例中,优化当前得到的张量,以得到N×N×M维张量,包括:
将当前得到的张量作为初始三维矩阵,并将初始三维矩阵分解为M个N×N维的矩阵Xi;i=1,2,……,M;
将初始化得到的d×d×M维张量O分解为M个d×d维的矩阵Oi;d为可调超参数;
初始化得到N×d维的矩阵A,并基于Xi=AOiAT和梯度下降法求解最优A’和M个最优Oi’;
基于最优A’和M个最优Oi’得到新三维矩阵;以及
基于max函数逐位对比初始三维矩阵和新三维矩阵,并保留各位置的最大值,得到N×N×M维张量。
在一些实施例中,确定每个第二实体与目标语句的相关性,包括:
针对每个第二实体,将目标语句中的每个第一实体与该第二实体的相关程度之和的归一化结果,确定为该第二实体与目标语句的相关性。
在一些实施例中,任一个第一实体与任一个第二实体的相关程度为:该第一实体与该第二实体的最大关系概率值加上该第二实体与该第一实体的最大关系概率值。
在一些实施例中,确定目标语句中的各第一实体,包括:
将目标语句中的每个字转换为1024维向量,得到向量集合;以及
将向量集合输入实体识别模型,以使实体识别模型识别目标语句中的各第一实体。
在第二方面,本申请实施例提供了一种自然语言处理装置,包括:
获取模块,用于获取待处理的目标语句,并确定目标语句中的各第一实体;
扩充模块,用于针对每个第一实体,响应于该第一实体存在于预设实体集中,在预设实体集中确定与该第一实体具有最大关联的第二实体,并基于所确定的第二实体生成扩充信息,将扩充信息添加至该第一实体在目标语句中所处位置之后,得到更新后的目标语句;第二实体为预设实体集中除该第一实体外的任一实体;以及
处理模块,用于将更新后的目标语句输入BERT模型,以使BERT模型执行自然语言处理任务。
在第三方面,本申请实施例提供了一种电子设备,包括:
存储器,用于存储计算机可读指令;以及
处理器,用于执行计算机可读指令,以实现前述任一实施例中的自然语言处理方法。
在第四方面,本申请提供了一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行前述任一实施例中自然语言处理方法的步骤。
附图说明
为了更清楚地说明本申请实施例或相关技术中的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。
图1为根据一个或多个实施例中一种自然语言处理方法流程图;
图2为根据一个或多个实施例中一种关系识别模型示意图;
图3为根据一个或多个实施例中一种实体识别模型示意图;
图4为根据一个或多个实施例中一种自然语言处理装置示意图;
图5为根据一个或多个实施例中一种电子设备示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
目前虽然可以为BERT模型的输入数据扩充相关信息,但目前的扩充方式容易引入过多噪声,增加了输入的数据量,还可能影响BERT模型精度,因此当前BERT模型会影响BERT模型处理效率和处理效果。本申请实施例提供了一种自然语言处理方案,能够为BERT模型的输入数据扩充有效信息,使BERT模型提升自然语言处理任务的处理精度。
参见图1所示,本申请实施例公开了一种自然语言处理方法,包括步骤S101、步骤102和步骤S103。
步骤S101、获取待处理的目标语句,并确定目标语句中的各第一实体。
其中,本实施例将目标语句中的名词称为第一实体,目标语句中至少包括一个第一实体。例如:“小明管理的A公司经营着服装业务”这一语句中,包括3个第一实体:小明、A公司、服装业务。
步骤S102、针对每个第一实体,若该第一实体存在于预设实体集中,则在预设实体集中确定与该第一实体具有最大关联的第二实体,并基于所确定的第二实体生成扩充信息,将扩充信息添加至该第一实体在目标语句中所处位置之后,得到更新后的目标语句。
其中,第二实体为预设实体集中除该第一实体外的任一实体。
具体地,步骤S102可以包括:针对每个第一实体,响应于该第一实体存在于预设实体集中,在预设实体集中确定与该第一实体具有最大关联的第二实体,并基于所确定的第二实体生成扩充信息,将扩充信息添加至该第一实体在目标语句中所处位置之后,得到更新后的目标语句。
需要说明的是,本实施例将预设实体集中的名词称为第二实体,预设实体集也就是众多第二实体的集合。若目标语句中的第一实体存在于预设实体集中,说明可以从预设实体集中找出与该第一实体有关联的第二实体,那么在预设实体集中确定与该第一实体具有最大关联的一个第二实体,并基于所确定的这一个第二实体生成扩充信息,然后将扩充信息添加至该第一实体在目标语句中所处位置之后,从而为该第一实体添加有效扩充信息。据此,为目标语句中的、且存在于预设实体集中的每个第一实体都可以添加有效扩充信息,从而可得到更新后的目标语句。其中,目标语句中的、不存在于预设实体集中的第一实体则不添加扩充信息。
假设目标语句为:“小明创立A公司”;为“小明”这一第一实体添加的扩充信息为:“是中国人”,其中的“中国人”为预设实体集中的第二实体;为“A公司”这一第一实体添加的扩充信息为: “经营服装业务”,其中的“服装业务”为预设实体集中的第二实体;那么最终得到的更新后的目标语句为:“小明是中国人创立A公司经营服装业务”。其中,“是”为:连接“小明”和“中国人”的关系词汇,“经营”为:连接“A公司”和“服装业务”的关系词汇,此类关系词汇可基于语言惯用语确定。
步骤S103、将更新后的目标语句输入BERT模型,以使BERT模型执行自然语言处理任务。
在本实施例中,将更新后的目标语句作为BERT模型的输入数据,可使BERT模型在执行自然语言处理任务时,能够尽可能获得更多信息,从而可提升处理精度和效果。
假设BERT模型处理问答任务,那么当BERT模型获得一个问句后,便可以按照本实施例为该问句中的各实体添加扩充信息,以更新该问句。更新得到新问句后,对新问句进行处理,以为新问句确定最为优选的答案。
可见,本实施例能够为目标语句中的各第一实体添加扩充信息,且所添加的扩充信息基于相应第一实体的最大关联第二实体生成,因此所添加的扩充信息与目标语句中的各第一实体的关联度较高,从而可为作为BERT模型输入数据的目标语句扩充有效信息,使BERT模型提升自然语言处理任务的处理精度,能够提高BERT模型的处理效率和处理效果。
在一种具体实施方式中,在预设实体集中确定与该第一实体具有最大关联的第二实体,包括:将该第一实体作为目标对象,并确定目标对象与每个第二实体的最大关系概率值,得到N-1个最大关系概率值;N-1为第二实体的个数,N为预设实体集包括的实体总数;确定每个第二实体与目标语句的相关性,得到N-1个相关性;针对每个第二实体,计算该第二实体对应的相关性和该第二实体对应的最大关系概率值的乘积,得到该第二实体对应的关联得分,得到N-1个关联得分;将N-1个关联得分中的最大关联得分对应的第二实体,作为与目标对象具有最大关联的第二实体。
假设预设实体集中包括N个实体,N为正整数,那么在任一个第一实体W存在于预设实体集的情况下,该第一实体W与剩下的N-1个第二实体都能确定出一个最大关系概率值,从而得到N-1个最大关系概率值。
由于一个第一实体与一个第二实体之间的关系用M维向量表示,故M维向量中有M个关系概率值,而本实施例将其中的最大值确定为该第一实体与该第二实体之间的最大关系概率值。
同时,预设实体集中除第一实体W外的N-1个第二实体都可以与目标语句求得一个相关性,从而得到N-1个相关性。
那么,站在任一个第二实体的角度,其对应有一个相关性和一个最大关系概率值,故计算该相关性与该最大关系概率值的乘积,可得到一个关联得分。由于有N-1个第二实体,故可得N-1个关联得分。这N-1个关联得分中的最大关联得分对应的第二实体即为:既与第一实体W关系最大、又与目标语句相关性最大的实体,因此将其作为与第一实体W具有最大关联的第二实体。基于第一实体W的 最大关联第二实体生成的扩充信息可认为是:第一实体W的有效且准确的扩充信息,因此可以提升信息扩充精度和准确度,避免扩充无效而不准确的信息。
在一种具体实施方式中,确定每个第二实体与目标语句的相关性,包括:针对每个第二实体,将目标语句中的每个第一实体与该第二实体的相关程度之和的归一化结果,确定为该第二实体与目标语句的相关性。其中,任一个第一实体与任一个第二实体的相关程度为:该第一实体与该第二实体的最大关系概率值加上该第二实体与该第一实体的最大关系概率值。
假设目标语句中包括3个第一实体:A、B、C,那么该目标语句与第二实体F的相关性为:A与F的相关程度、B与F的相关程度,以及C与F的相关程度之和,求得的和需要进行归一化处理。而A与F的相关程度为:A与F的最大关系概率值加上F与A的最大关系概率值。而B与F的相关程度为:B与F的最大关系概率值加上F与B的最大关系概率值。而C与F的相关程度为:C与F的最大关系概率值加上F与C的最大关系概率值。可见,计算任意两个实体间的相关程度,需要在张量中找到与这两个实体有关的两个最大关系概率值,然后对这两个最大关系概率值求和并进行归一化。
其中,M维关系向量中每一维度上数值介于0-1之间,各数值表示具备相应关系的概率,所有维度的数值之和为1。
在一种具体实施方式中,确定目标对象与每个第二实体的最大关系概率值,包括:生成用于表示预设实体集中各个实体间关系及关系概率值的N×N×M维张量;M为预设实体集中不同实体间的关系向量的维数;基于N×N×M维张量生成知识图谱,并在知识图谱中查询目标对象与每个第二实体的最大关系概率值。其中,N×N×M维张量等价于知识图谱,因此基于N×N×M维张量生成知识图谱只是将张量中的相关信息以知识图谱的方式进行表示。
可见,在知识图谱中可查询得到每个第一实体的与每个第二实体的最大关系概率值。而知识图谱基于N×N×M维张量生成,那么N×N×M维张量的生成过程包括:生成由N×N×M维全0构成的初始张量;获取用于生成预设实体集的语句库,并遍历语句库中的每个句子,将遍历到的句子作为待识别句子;将待识别句子中相邻的两个实体作为实体组,得到多个实体组;利用关系识别模型识别每个实体组中的两个实体间的关系,得到多个M维关系向量;针对每个M维关系向量,若任一M维关系向量中的最大数值大于预设阈值,则将最大数值在初始张量中对应位置的元素由0更新为1,以更新初始张量;遍历语句库中的下一个句子,并继续更新当前张量,直至语句库中的每个句子均已被遍历,则全0的初始张量中的有些位置的元素会变更为1,而有些位置的元素则仍然保留为0,从而得到由0和1构成的张量,同时优化当前得到的、由0和1构成的张量,以得到N×N×M维张量。其中,针对每个M维关系向量,若任一M维关系向量中的最大数值大于预设阈值,则将最大数值在初始张量中对应位置的元素由0更新为1,以更新初始张量,包括:针对每个M维关系向量,响应于任一M维关系向量中的最大数值大于预设阈值,将最大数值在初始张量中对应位置的元素由0更新为1,以更新 初始张量。
在一种具体实施方式中,利用关系识别模型识别每个实体组中的两个实体间的关系,得到多个M维关系向量,包括:针对任一实体组中的两个实体,将待识别句子中的这两个实体用不同标识符进行替换,将替换得到的新句子输入关系识别模型,以使关系识别模型输出这两个实体对应的M维关系向量。
其中,关系识别模型可参照图2,包括一个Transformer(转换器)结构的子模型和一个关系分类神经网络;Transformer结构的子模型包括12个多头注意力模块。当然,关系识别模型也可以是其他结构。
任一个新句子先输入Transformer结构的子模型,得到两个特征向量:mα和mβ,然后mα和mβ输入关系分类神经网络,即可得到α和β之间的M个关系及M个关系概率值,这M个关系及M个关系概率值用M维关系向量表示,向量中的每个元素即一个关系概率值。其中,α和β即:替换相邻两个实体的不同标识符,故α和β之间的M个关系及M个关系概率值也就是:被替换的相邻两个实体之间的M个关系及M个关系概率值。
具体的,用α和β替换相邻实体可参照如下示例。假设待识别句子为:小明管理的A公司经营着服装业务。据此可确定两个实体组:“小明+A公司”、“A公司+服装业务”,那么可得到2个新句子:①α管理的β经营着服装业务网;②小明管理的α经营着β。
需要说明的是,针对新句子①求得的M维关系向量表示:“小明”与“A公司”间存在的关系。如果基于语句“A公司有小明”确定得到子句:α有β。那么针对子句“α有β”求得的M维关系向量表示:“A公司”与“小明”间存在的关系。可见,相邻两个实体的位置互换后,需要使用关系识别模型再次进行关系识别。
需要说明的是,当两个实体间的关系在张量中呈现为0时,可能存在如下两种情况:(1)这两个实体没有关系;(1)这两个实体有关系但关系缺失。为确定此类情况,本申请在得到由0和1构成的张量后,还对当前得到的、由0和1构成的张量进行了优化。
在一种具体实施方式中,优化当前得到的张量,以得到N×N×M维张量,包括:将当前得到的张量作为初始三维矩阵(即0和1构成的矩阵),并将初始三维矩阵分解为M个N×N维的矩阵X i;i=1,2,……,M;将初始化得到的d×d×M维张量O分解为M个d×d维的矩阵O i;d为可调超参数;初始化得到N×d维的矩阵A,并基于X i=AO iA T和梯度下降法求解最优A’和M个最优O i’;基于最优A’和M个最优O i’得到新三维矩阵;基于max函数逐位对比初始三维矩阵和新三维矩阵,并保留各位置的最大值,得到N×N×M维张量。
一般地,采用矩阵分解方法可以对矩阵进行变换。由于本申请中的张量是N×N×M的三维矩阵,因此需要先将N×N×M维张量分解为M个N×N维的二维矩阵:X 1……X i……X M,同时将初始化得到 的d×d×M维张量O分解为M个d×d维的矩阵:O 1……O i……O M。这样,就可以把每一个X i分解为AO iA T,即:X 1=AO 1A T,X 2=AO 2A T……X M=AO MA T。如此基于梯度下降法可求解得到最优A’和M个最优O i’,也就是:分别计算上述M个等式中的最优A’和最优O i’。之后,再计算X’ 1=A’O 1A’ T,X’ 2=A’O 2A’ T……X’ M=A’O MA’ T,得到M个计算结果:X’ 1……X’ i……X’ M。拼接M个计算结果即可得到新三维矩阵。A’ T为A’的转置矩阵。当然,若得到M个最优O i’:O’ 1……O’ i……O’ M,可拼接O’ 1……O’ i……O’ M得到新的d×d×M维张量O,那么基于X=AOA T即可得到新三维矩阵。
之后,逐位对比初始三维矩阵和新三维矩阵中的各个元素,并保留各位置的最大值,从而得到N×N×M维张量。例如:某一位置在初始三维矩阵中的元素为0,而在新三维矩阵中的元素为0.02,那么X”该位置的元素记为0.02。又如:某一位置在初始三维矩阵中的元素为1,而在新三维矩阵中的元素为0.99,那么X”该位置的元素记为1。
在一种具体实施方式中,确定目标语句中的各第一实体,包括:将目标语句中的每个字转换为1024维向量,得到向量集合;将向量集合输入实体识别模型,以使实体识别模型识别目标语句中的各第一实体。
其中,实体识别模型可参见图3。图3所示的实体识别模型基于Transformer实现,其中包括6个多头自注意力模块。当然,实体识别模型也可以基于其他结构实现。图2和图3中的“位置编码”用于记录:句子中的各个字在句子中的位置。
针对“北京是中国的首都”这一语句,实体识别模型输出的识别结果可以是:北京是中国的首都(AABAABBB)。其中,A表示实体,B表示非实体。
下述实施例按照数据准备、实体识别、关系识别、构建知识图谱、信息价值评估、信息嵌入几个步骤,将知识图谱中的专业知识添加到BERT模型的输入数据中,使BERT模型提升自然语言处理任务的精度。
1、数据准备。
构建包括众多语句的语句库,对语句库中的全部文本,构建包含全部字的字表W。此外,字表W中还包含2个特殊标记符号α和β。该字表W可用于确定两个实体间的关系词汇。
2、实体识别。
构建可对文本进行实体识别的模型,得到实体识别模型,具体的实体识别模型可如图3所示。从语句库中随机抽取句子进行人工标注,可得到训练文本,用有标注的文本对模型进行训练,即可得到实体识别模型,具体的训练过程可参照现有相关技术,在此不再赘述。
将字表W包含的每个字转换为一个Z维的one-hot向量,然后通过矩阵J∈R 1024×Z将Z维one-hot向量映射为1024维向量。R为预设实数集合。这样,每个字都可以用一个1024维向量表示。
针对语句库中的某一句子,将句子中的各个字都用1024维向量表示,那么可得1024维向量的集 合。将该集合输入实体识别的模型,皆可识别出该句中的各个实体。
3、关系识别。
利用实体识别模型可识别出语句库中每个句子中的实体。那么针对每个句子,可进一步识别其中实体的关系。
首先构建包含所有我们感兴趣的关系的关系表G,关系表G中有M种不同的关系,其中一种关系被定义为无关系,表示两实体间不存在关系。关系表G根据需求自定义,其中每种关系对应一个M维one-hot向量。规定:M维one-hot向量中第k种关系为记为r k,对应向量的第k维。
其次对一句话中的两个相邻实体用α和β进行替换。一种对句子的实体进行替换的方法包括:对于一个句子,确定其中的所有实体后,依次顺序替换句子中相邻2个实体。
具体的,首先逐字遍历该句子,将句子中的第一个实体对应的全部字替换为α,将句中第二个实体对应的全部字替换为β,得到一个子句(即一个新句子);从句中第二个实体开始,将句中第二个实体对应的全部字替换为α,将第三个实体对应的全部字替换为β,得到第二个子句……如此重复,直至句中所有实体均被替换。假设句子为:小明管理的A公司经营着服装业务。据此可确定两个实体组:“小明+A公司”、“A公司+服装业务”,那么可得到2个子句:①α管理的β经营着服装业务网;②小明管理的α经营着β。
构建关系识别模型。从语句库中随机抽取一小部分句子构成训练集,将每个句子进行实体替换,对于包含q个实体的句子,替换后可得q-1个子句;对每个子句,根据句意判断α、β间关系,在关系表G中找到对应进行人工标注,每个子句对应的标签为表示α、β间关系的M维one-hot向量。利用adam算法最小化logits回归损失函数训练模型,即可得到关系识别模型,具体的关系识别模型可如图2所示。关系识别模型的具体训练过程可参照现有相关技术,在此不再赘述。
由于针对一个句子,可得到至少一个子句,将各个子句分别输入关系识别模型,可输出该句子中各组相邻实体之间的关系,每两个相邻实体之间的关系用M维向量表示。
具体的,实体替换后得到的子句被输入至图2所示的关系识别模型,从Transformer可输出与α、β对应的两向量mα、mβ,将这两个向量拼接得到向量h=<hα|hβ>,此时向量h包含两实体的上下文信息。向量h随后被输入至关系分类神经网络进行关系分类,该网络输出层为softmax归一化层,可输出一个M维向量v=(p 1,p 2,……,p M),其中每个维度上的值表示输入的子句中α、β间关系为G中对应关系的概率。
例如,若输出向量为(p 1,p 2,……,p M),那么第1维、第2维、……、第M维分别对应关系r 1、r 2、……、r M,则可认为输入的子句中α、β间存在关系r 1的概率为p 1,存在关系r 2的概率为p 2,……,存在关系r M的概率为p M。又因为实体间关系由句子整体语义决定,与实体本身无关,故α、β间关系等价于原句中被替换的两实体间关系。
4、构建知识图谱。
利用上述实体识别模型识别语句库中的所有实体,并去除重复实体,得到实体表E(即预设实体库)。实体表E共包含N个实体。规定:e i为E中第i个实体,1≤i≤N。
利用关系识别模型识别每句话中相邻两实体之间的关系并构建知识图谱。知识图谱用N×N×M维张量量化表示,从张量中可看出:实体表E所包含的任意两实体间是否存在关系表G中所包含的任意一种关系。N为实体表E包含的实体数,M为关系表G包含的关系数。用X ijk表示张量中的一个位置的元素,其对应实体表E中两个实体和关系表G中的一种关系,对应规则为:X ijk对应实体表E中第i个实体e i和第j个实体e j,以及关系表G中第k个关系r k
具体的,知识图谱构建过程包括:
(1)构建初始的N×N×M维张量X,此时X中全部值初始化为0。
(2)利用关系识别模型识别语句库中每句话中相邻两实体之间的关系,得到众多M维向量。针对任一个M维向量,找出该向量中数值最大的一维,根据该维度所对应的关系和子句中α、β所替换的实体更新初始的N×N×M维张量X。
例如,假设对于某个子句,输出的M维关系向量为v=(p 1,p 2,……,p M),其中,第k维对应数值p k最大,该维度对应关系表G中第k个关系r k(1≤k≤M)。又假设该子句中被α、β替换的两实体对应实体表E中第i个实体e i和第j个实体e j(1≤i,j≤N)。那么,当且仅当p k大于预先规定值θ(0<θ<1)时,认为实体e i、e j间存在关系r k,此种情况下将初始X中的X ijk=1;否则,初始X中的X ijk=0。其中,规定值θ即预设阈值。
针对所有M维向量,都据此持续更新X,直至所有M维向量都被处理,那么输出最终得到的X。
(3)基于张量分解对步骤(2)最终得到的X进行优化,以为知识图谱补充潜在信息。
通过步骤(2)获得的X通常是稀疏的,本步骤利用张量分解对X中潜在关系进行推理,挖掘潜在关系,从而补充完善了知识图谱中知识(实体间关系)。
对于步骤(2)获得的X,若X中某一点X ijk=1,则表示实体表E中第i个实体e i与实体表中第j个实体e j存在关系表G中的第k种关系r k;若X ijk=0,则无法确定该关系不存在还是记录该关系的信息缺失。
因此,本步骤利用X中为1的元素推断X中原本为0的位置处的值,即通过已有确定关系推断不确定关系存在的可能。例如,假设我们已知关系:(小明,管理,A公司)、(小马,管理,B公司)、(A公司,经营,服装业务)、(B公司,经营,微信)、(小明,管理,服装业务),可以推理出:(小马,管理,微信)成立的概率较大,据此可补充知识图谱。
下面详细介绍利用张量分解补充X的方法,包括:将张量X分解为X≈AOA T(A为N×d维矩 阵,O为d×d×M维张量,其中d为可调超参数,A T为A的转置矩阵),利用梯度下降法更新A、O,待损失函数收敛到足够小之后,根据张量X’=AOA T更新X。
具体实施步骤包括:将张量X拆分为M个N×N维矩阵X 1、……、X M,同时构造一个N×d维矩阵A和M个d×d维矩阵O 1、……、O M,以将X 1、……、X M分解为AO 1A T、……、AO MA T
随机初始化矩阵A、O 1、……、O M后,利用梯度下降法最小化损失函数
Figure PCTCN2022102856-appb-000001
则有:
Figure PCTCN2022102856-appb-000002
Figure PCTCN2022102856-appb-000003
据此,在每轮循环中更新A和O 1、……、O M
Figure PCTCN2022102856-appb-000004
η为学习速率,i=1,……,M。
多次循环损失函数收敛到足够小之后,得到最优A、最优O 1、……、O M,将O 1、……、O M拼接得到新的张量O。
之后,计算张量X’=AOA T,并将X’与X逐个位置取最大值,以更新张量X,即最终的X” ijk=max(X ijk’,X ijk)。更新后张量X上每一点的值表示与该点对应的实体表E中两个实体间存在与该点对应的关系表G中关系的可能性,或称为两实体存在该关系这一信息的可信度。例如:X ijk对应实体表E中第i个实体e i和第j个实体e j以及关系表G中第k个关系r k,则X ijk=0.8表示实体e i与实体e j间存在关系r k的概率为0.8。
最后,基于最终得到的X”生成知识图谱。
5、信息价值评估和信息嵌入。
对于某个即将输入至BERT模型进行自然语言处理任务的句子T,将知识图谱中某些信息插入至句子T中。那么,如何从知识图谱中选择需要插入的信息呢,本步骤通过计算实体间相关程度、待插入信息中的实体与句子间相关性,来评估待插入信息对句子的价值,并选择最大价值的信息进行插入。
(1)计算实体表E中任意两实体间的相关程度。
因为上面步骤构造的知识图谱(张量X”)表示实体表E中任意两实体间存在关系表G中某种关系的可能性,我们用两实体间存在关系表G中最大关系的可能性之和衡量两实体间相关程度,并通过 softmax归一化得到关联程度得分。例如:对于实体表E中第i实体e i和第j个实体e j,两实体的相关程度得分为:
Figure PCTCN2022102856-appb-000005
我们认为两实体间存在的关系数越多,两实体的相关性就越高。其中,X jik对应实体表E中第j个实体e j和第i个实体e i,存在关系表G中第k个关系r k的概率值。
(2)计算待插入信息中的实体与句子间的相关性得分。
我们认为一个句子中所有实体与实体表中某个实体间相关程度越高,则句子与那个实体间相关性就越高,故使用上面得到的实体间相关程度得分计算句子与实体间的相关性得分。
假设句子T与实体表E共有p个实体相交:e t1、……e tp。此时,句子T与实体表E中某一个实体e i的相关性由实体e t1、……e tp与实体e i间相关程度之和的归一化结果衡量,即,T与e i间相关性得分为:
Figure PCTCN2022102856-appb-000006
其中,N为实体表E包含的实体数,p为实体表E和句子T共同包含的实体数。
(3)下面利用相关性得分函数Y和知识图谱(张量X”)选择插入至句子T的信息(实体间关系)。其中,信息表示两实体间存在某一种关系这一知识,例如:“实体e i与实体e j间存在关系r k”为一个信息。信息对于某一句子的价值与该信息存在的可能性、该信息与句子的相关性正相关。
逐字遍历句子T,找到全部同时存在于句子T和实体表E的实体e t1、……e tp,共p个,对应实体表E中第t 1个、……、第t p个实体。接着根据相关性得分函数Y,计算句子T与实体表中每一个实体间的相关性得分Y(1,T)、……、Y(N,T)。
随后针对句子T中的实体e t1、……e tp,为每个实体选择补充信息,补充信息的选择方法如下:对e t1、……e tp中某个实体eq(q∈{t1,……,tp}),从张量X”中得到实体eq与实体表E中任意一个其他实体e i(i≠q)间存在关系表G的全部M种关系中每一种的概率X qi1、……X qiM,对于实体e q和实 体e i(i≠q)之间的M种关系概率,求取最大值,得e q与e i的最大关系概率。
对实体表E中所有与e q不同的N-1个实体做上述相同操作,可以得到e q的N-1个最大关系概率。
假设这N-1个关系概率为:e q与e ji间存在关系表G中第k 1种关系r k1、e q与e j2间存在关系表G中第k 2种关系r k2、……、e q与e jN-1间存在关系表G中第N-1种关系r kN-1。e ji、……、e jN-1为实体表E中除e q外的其他实体。那么,N-1个最大关系概率为:
Figure PCTCN2022102856-appb-000007
接着,将N-1个最大关系概率与T和e i的相关性得分相乘,得到N-1个信息评估值,选择这N-1个信息评估值中的最大值对应实体及信息(实体间关系),在e q后进行信息插入。具体的,通过实体表E、关系表G、字表W将所确定的实体及信息转化为字,并插入到实体e q在句子T中所在位置的后面。
其中,N-1个信息评估值具体可以为:
Figure PCTCN2022102856-appb-000008
Figure PCTCN2022102856-appb-000009
V 1至V N-1即:相应实体及信息对于句子T的价值。
逐个遍历T中实体e t1、……e tp,按照上述步骤为每个实体插入信息,最终得到信息扩充后的句子。BERT模型处理信息扩充后的句子可以获得更好的处理效果。
本实施例在从知识图谱中选择插入信息(实体间关系)时,充分考虑到插入信息对于句子的价值(体现为该信息与句子的相关度和该信息的可信度),确保了插入信息的质量,减少了噪声的引入。
可见,本实施例可基于实体识别模型识别句子中的实体,并利用关系识别模型识别句子中相邻两实体间的关系,并据此构建并优化知识图谱,得到表示任意两实体间关系的张量X”。同时,通过计算实体间相关程度、待插入信息中的实体与句子间相关性,来评估待插入信息对句子的价值,从而使插入信息更为准确且有效,减少了噪声的引入。
下面对本申请实施例提供的一种自然语言处理装置进行介绍,下文描述的一种自然语言处理装置与上文描述的一种自然语言处理方法可以相互参照。
参见图4所示,本申请实施例公开了一种自然语言处理装置,包括:
获取模块401,用于获取待处理的目标语句,并确定目标语句中的各第一实体;
扩充模块402,用于针对每个第一实体,若该第一实体存在于预设实体集中,则在预设实体集中确定与该第一实体具有最大关联的第二实体,并基于所确定的第二实体生成扩充信息,将扩充信息添加至该第一实体在目标语句中所处位置之后,得到更新后的目标语句;第二实体为预设实体集中除该第一实体外的任一实体;
处理模块403,用于将更新后的目标语句输入BERT模型,以使BERT模型执行自然语言处理任务。
在一些具体实施方式中,扩充模块402具体用于:针对每个第一实体,响应于该第一实体存在于预设实体集中,在预设实体集中确定与该第一实体具有最大关联的第二实体,并基于所确定的第二实体生成扩充信息,将扩充信息添加至该第一实体在目标语句中所处位置之后,得到更新后的目标语句。
在一种具体实施方式中,扩充模块包括:
第一确定单元,用于将该第一实体作为目标对象,并确定目标对象与每个第二实体的最大关系概率值,得到N-1个最大关系概率值;N-1为第二实体的个数,N为预设实体集包括的实体总数;
第二确定单元,用于确定每个第二实体与目标语句的相关性,得到N-1个相关性;
计算单元,用于针对每个第二实体,计算该第二实体对应的相关性和该第二实体对应的最大关系概率值的乘积,得到该第二实体对应的关联得分,得到N-1个关联得分;
选择单元,用于将N-1个关联得分中的最大关联得分对应的第二实体,作为与目标对象具有最大关联的第二实体。
在一种具体实施方式中,第一确定单元包括:
生成子单元,用于生成用于表示预设实体集中各个实体间关系及关系概率值的N×N×M维张量;M为预设实体集中不同实体间的关系向量的维数;
查询子单元,用于基于N×N×M维张量生成知识图谱,并在知识图谱中查询目标对象与每个第二实体的最大关系概率值。
在一种具体实施方式中,生成子单元具体用于:
生成由N×N×M维全0构成的初始张量;
获取用于生成预设实体集的语句库,并遍历语句库中的每个句子,将遍历到的句子作为待识别句子;
将待识别句子中相邻的两个实体作为实体组,得到多个实体组;
利用关系识别模型识别每个实体组中的两个实体间的关系,得到多个M维关系向量;
针对每个M维关系向量,若任一M维关系向量中的最大数值大于预设阈值,则将最大数值在初始张量中对应位置的元素由0更新为1,以更新初始张量;
遍历语句库中的下一个句子,并继续更新当前张量,直至语句库中的每个句子均已被遍历,则输出并优化当前得到的张量,以得到N×N×M维张量。
在一些具体实施方式中,生成子单元针对每个M维关系向量,响应于任一M维关系向量中的最大数值大于预设阈值,将最大数值在初始张量中对应位置的元素由0更新为1,以更新初始张量。
在一种具体实施方式中,生成子单元具体用于:
针对任一实体组中的两个实体,将待识别句子中的这两个实体用不同标识符进行替换,将替换得 到的新句子输入关系识别模型,以使关系识别模型输出这两个实体对应的M维关系向量。
在一种具体实施方式中,生成子单元具体用于:
将当前得到的张量作为初始三维矩阵,并将初始三维矩阵分解为M个N×N维的矩阵X i;i=1,2,……,M;
将初始化得到的d×d×M维张量O分解为M个d×d维的矩阵O i;d为可调超参数;
初始化得到N×d维的矩阵A,并基于X i=AO iA T和梯度下降法求解最优A’和M个最优Oi’;
基于最优A’和M个最优O i’得到新三维矩阵;
基于max函数逐位对比初始三维矩阵和新三维矩阵,并保留各位置的最大值,得到N×N×M维张量。
在一种具体实施方式中,第二确定单元具体用于:
针对每个第二实体,将目标语句中的每个第一实体与该第二实体的相关程度之和的归一化结果,确定为该第二实体与目标语句的相关性。
在一种具体实施方式中,任一个第一实体与任一个第二实体的相关程度为:该第一实体与该第二实体的最大关系概率值与该第二实体与该第一实体的最大关系概率值之和的归一化结果。
在一种具体实施方式中,获取模块具体用于:
将目标语句中的每个字转换为1024维向量,得到向量集合;
将向量集合输入实体识别模型,以使实体识别模型识别目标语句中的各第一实体。
其中,关于本实施例中各个模块、单元更加具体的工作过程可以参考前述实施例中公开的相应内容,在此不再进行赘述。
可见,本实施例提供了一种自然语言处理装置,可为作为BERT模型输入数据的目标语句扩充有效信息,使BERT模型提升自然语言处理任务的处理精度,能够提高BERT模型的处理效率和处理效果。
下面对本申请实施例提供的一种电子设备进行介绍,下文描述的一种电子设备与上文描述的一种自然语言处理方法及装置可以相互参照。
参见图5所示,本申请实施例公开了一种电子设备,包括:
存储器501,用于保存计算机可读指令;
处理器502,用于执行计算机可读指令,以实现上述任一实施例公开的自然语言处理方法。
本申请实施例提供了一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,下文描述的一个或多个存储有计算机可读指令的非易失性计算机可读存储介质与上文描述的一种自然语言处理方法、装置及设备可以相互参照。
一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,计算机可读指令被一个或多 个处理器执行时,使得一个或多个处理器执行前述任一实施例中的自然语言处理方法的步骤。关于该方法的具体步骤可以参考前述实施例中公开的相应内容,在此不再进行赘述。
本申请涉及的“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法或设备固有的其它步骤或单元。
需要说明的是,在本申请中涉及“第一”、“第二”等的描述仅用于描述目的,而不能理解为指示或暗示其相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。另外,各个实施例之间的技术方案可以相互结合,但是必须是以本领域普通技术人员能够实现为基础,当技术方案的结合出现相互矛盾或无法实现时应当认为这种技术方案的结合不存在,也不在本申请要求的保护范围之内。
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似部分互相参见即可。
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的可读存储介质中。
本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (13)

  1. 一种自然语言处理方法,其特征在于,包括:
    获取待处理的目标语句,并确定所述目标语句中的各第一实体;
    针对每个第一实体,响应于该第一实体存在于预设实体集中,在所述预设实体集中确定与该第一实体具有最大关联的第二实体,并基于所确定的第二实体生成扩充信息,将所述扩充信息添加至该第一实体在所述目标语句中所处位置之后,得到更新后的目标语句;所述第二实体为所述预设实体集中除该第一实体外的任一实体;以及
    将所述更新后的目标语句输入BERT模型,以使所述BERT模型执行自然语言处理任务。
  2. 根据权利要求1所述的方法,其特征在于,所述在所述预设实体集中确定与该第一实体具有最大关联的第二实体,包括:
    将该第一实体作为目标对象,并确定所述目标对象与每个第二实体的最大关系概率值,得到N-1个最大关系概率值;N-1为第二实体的个数,N为所述预设实体集包括的实体总数;
    确定每个第二实体与所述目标语句的相关性,得到N-1个相关性;
    针对每个第二实体,计算该第二实体对应的相关性和该第二实体对应的最大关系概率值的乘积,得到该第二实体对应的关联得分,得到N-1个关联得分;以及
    将所述N-1个关联得分中的最大关联得分对应的第二实体,作为与所述目标对象具有最大关联的第二实体。
  3. 根据权利要求2所述的方法,其特征在于,所述确定所述目标对象与每个第二实体的最大关系概率值,包括:
    生成用于表示所述预设实体集中各个实体间关系及关系概率值的N×N×M维张量;M为所述预设实体集中不同实体间的关系向量的维数;以及
    基于所述N×N×M维张量生成知识图谱,并在所述知识图谱中查询所述目标对象与每个第二实体的最大关系概率值。
  4. 根据权利要求3所述的方法,其特征在于,所述生成用于表示所述预设实体集中各个实体间关系及关系概率值的N×N×M维张量,包括:
    生成由N×N×M维全0构成的初始张量;
    获取用于生成所述预设实体集的语句库,并遍历所述语句库中的每个句子,将遍历到的句子作为待识别句子;
    将所述待识别句子中相邻的两个实体作为实体组,得到多个实体组;
    利用关系识别模型识别每个实体组中的两个实体间的关系,得到多个M维关系向量;
    针对每个M维关系向量,响应于任一M维关系向量中的最大数值大于预设阈值,将所述最大数值在所述初始张量中对应位置的元素由0更新为1,以更新所述初始张量;以及
    遍历所述语句库中的下一个句子,并继续更新当前张量,直至所述语句库中的每个句子均已被遍历,则输出并优化当前得到的张量,以得到所述N×N×M维张量。
  5. 根据权利要求4所述的方法,其特征在于,所述利用关系识别模型识别每个实体组中的两个实体间的关系,得到多个M维关系向量,包括:
    针对任一实体组中的两个实体,将所述待识别句子中的这两个实体用不同标识符进行替换,将替换得到的新句子输入所述关系识别模型,以使所述关系识别模型输出这两个实体对应的M维关系向量。
  6. 根据权利要求4或5所述的方法,其特征在于,所述优化当前得到的张量,以得到所述N×N×M维张量,包括:
    将当前得到的张量作为初始三维矩阵,并将所述初始三维矩阵分解为M个N×N维的矩阵Xi;i=1,2,……,M;
    将初始化得到的d×d×M维张量O分解为M个d×d维的矩阵Oi;d为可调超参数;
    初始化得到N×d维的矩阵A,并基于Xi=AOiAT和梯度下降法求解最优A’和M个最优Oi’;
    基于最优A’和M个最优Oi’得到新三维矩阵;以及
    基于max函数逐位对比所述初始三维矩阵和新三维矩阵,并保留各位置的最大值,得到所述N×N×M维张量。
  7. 根据权利要求5所述的方法,其特征在于,所述关系识别模型包括转换器结构的子模型和关系分类神经网络;
    所述将替换得到的新句子输入所述关系识别模型,以使所述关系识别模型输出这两个实体对应的M维关系向量,包括:
    将所述新句子输入所述转换器结构的子模型,获得带有所述两个实体的标识符的特征向量;以及
    将带有所述两个实体的标识符的特征向量输入所述关系分类神经网络,获得所述两个实体对应的M维关系向量。
  8. 根据权利要求2至7任一项所述的方法,其特征在于,所述确定每个第二实体与所述目标语句的相关性,包括:
    针对每个第二实体,将所述目标语句中的每个第一实体与该第二实体的相关程度之和的归一化结果,确定为该第二实体与所述目标语句的相关性。
  9. 根据权利要求8所述的方法,其特征在于,任一个第一实体与任一个第二实体的相关程度为:该第一实体与该第二实体的最大关系概率值加上该第二实体与该第一实体的最大关系概率。
  10. 根据权利要求1至9任一项所述的方法,其特征在于,所述确定所述目标语句中的各第一实体,包括:
    将所述目标语句中的每个字转换为1024维向量,得到向量集合;以及
    将所述向量集合输入实体识别模型,以使所述实体识别模型识别所述目标语句中的各第一实体。
  11. 一种自然语言处理装置,其特征在于,包括:
    获取模块,用于获取待处理的目标语句,并确定所述目标语句中的各第一实体;
    扩充模块,用于针对每个第一实体,响应于该第一实体存在于预设实体集中,在所述预设实体集中确定与该第一实体具有最大关联的第二实体,并基于所确定的第二实体生成扩充信息,将所述扩充 信息添加至该第一实体在所述目标语句中所处位置之后,得到更新后的目标语句;所述第二实体为所述预设实体集中除该第一实体外的任一实体;以及
    处理模块,用于将所述更新后的目标语句输入BERT模型,以使所述BERT模型执行自然语言处理任务。
  12. 一种电子设备,其特征在于,包括:
    存储器,用于存储计算机可读指令;以及
    处理器,用于执行所述计算机可读指令,以实现如权利要求1至10任一项所述的方法。
  13. 一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如权利要求1至10任一项所述方法的步骤。
PCT/CN2022/102856 2022-01-05 2022-06-30 一种自然语言处理方法、装置、设备及可读存储介质 WO2023130687A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210003234.6A CN114021573B (zh) 2022-01-05 2022-01-05 一种自然语言处理方法、装置、设备及可读存储介质
CN202210003234.6 2022-01-05

Publications (1)

Publication Number Publication Date
WO2023130687A1 true WO2023130687A1 (zh) 2023-07-13

Family

ID=80069638

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/102856 WO2023130687A1 (zh) 2022-01-05 2022-06-30 一种自然语言处理方法、装置、设备及可读存储介质

Country Status (2)

Country Link
CN (1) CN114021573B (zh)
WO (1) WO2023130687A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117217318A (zh) * 2023-11-07 2023-12-12 瀚博半导体(上海)有限公司 基于Transformer网络模型的文本生成方法和装置

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114021573B (zh) * 2022-01-05 2022-04-22 苏州浪潮智能科技有限公司 一种自然语言处理方法、装置、设备及可读存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190384814A1 (en) * 2018-06-13 2019-12-19 Royal Bank Of Canada System and method for processing natural language statements
CN111581229A (zh) * 2020-03-25 2020-08-25 平安科技(深圳)有限公司 Sql语句的生成方法、装置、计算机设备及存储介质
CN111813802A (zh) * 2020-09-11 2020-10-23 杭州量之智能科技有限公司 一种基于自然语言生成结构化查询语句的方法
WO2021169745A1 (zh) * 2020-02-25 2021-09-02 升智信息科技(南京)有限公司 基于语句前后关系预测的用户意图识别方法及装置
CN113779211A (zh) * 2021-08-06 2021-12-10 华中科技大学 一种基于自然语言实体关系的智能问答推理方法和系统
CN114021573A (zh) * 2022-01-05 2022-02-08 苏州浪潮智能科技有限公司 一种自然语言处理方法、装置、设备及可读存储介质

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113779185B (zh) * 2020-06-10 2023-12-29 武汉Tcl集团工业研究院有限公司 一种自然语言模型的生成方法和计算机设备
CN113158665B (zh) * 2021-04-02 2022-12-09 西安交通大学 一种基于文本摘要生成与双向语料改善对话文本生成的方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190384814A1 (en) * 2018-06-13 2019-12-19 Royal Bank Of Canada System and method for processing natural language statements
WO2021169745A1 (zh) * 2020-02-25 2021-09-02 升智信息科技(南京)有限公司 基于语句前后关系预测的用户意图识别方法及装置
CN111581229A (zh) * 2020-03-25 2020-08-25 平安科技(深圳)有限公司 Sql语句的生成方法、装置、计算机设备及存储介质
CN111813802A (zh) * 2020-09-11 2020-10-23 杭州量之智能科技有限公司 一种基于自然语言生成结构化查询语句的方法
CN113779211A (zh) * 2021-08-06 2021-12-10 华中科技大学 一种基于自然语言实体关系的智能问答推理方法和系统
CN114021573A (zh) * 2022-01-05 2022-02-08 苏州浪潮智能科技有限公司 一种自然语言处理方法、装置、设备及可读存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117217318A (zh) * 2023-11-07 2023-12-12 瀚博半导体(上海)有限公司 基于Transformer网络模型的文本生成方法和装置
CN117217318B (zh) * 2023-11-07 2024-01-26 瀚博半导体(上海)有限公司 基于Transformer网络模型的文本生成方法和装置

Also Published As

Publication number Publication date
CN114021573B (zh) 2022-04-22
CN114021573A (zh) 2022-02-08

Similar Documents

Publication Publication Date Title
CN111310438B (zh) 基于多粒度融合模型的中文句子语义智能匹配方法及装置
CN109918666B (zh) 一种基于神经网络的中文标点符号添加方法
WO2021135910A1 (zh) 基于机器阅读理解的信息抽取方法、及其相关设备
WO2023130687A1 (zh) 一种自然语言处理方法、装置、设备及可读存储介质
CN112507065B (zh) 一种基于注释语义信息的代码搜索方法
TWI662425B (zh) 一種自動生成語義相近句子樣本的方法
CN104199965B (zh) 一种语义信息检索方法
CN111241294A (zh) 基于依赖解析和关键词的图卷积网络的关系抽取方法
JP6187877B2 (ja) 同義語抽出システム、方法および記録媒体
US8239349B2 (en) Extracting data
CN113255320A (zh) 基于句法树和图注意力机制的实体关系抽取方法及装置
CN111274267A (zh) 一种数据库查询方法、装置及计算机可读取存储介质
WO2014002774A1 (ja) 同義語抽出システム、方法および記録媒体
CN112632250A (zh) 一种多文档场景下问答方法及系统
CN115759092A (zh) 一种基于albert的网络威胁情报命名实体识别方法
WO2023130688A1 (zh) 一种自然语言处理方法、装置、设备及可读存储介质
CN111753066A (zh) 一种技术交底文本扩充方法、装置和设备
Dai et al. An N-ary tree-based model for similarity evaluation on mathematical formulae
CN108595413B (zh) 一种基于语义依存树的答案抽取方法
CN111581960B (zh) 一种获取医学文本语义相似度的方法
CN112417170A (zh) 面向不完备知识图谱的关系链接方法
CN116644148A (zh) 关键词识别方法、装置、电子设备及存储介质
JP4499003B2 (ja) 情報処理方法及び装置及びプログラム
CN112528003B (zh) 一种基于语义排序和知识修正的多项选择问答方法
CN114661912A (zh) 基于无监督句法分析的知识图谱构建方法、装置及设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22918143

Country of ref document: EP

Kind code of ref document: A1