WO2021135477A1 - 基于概率图模型的文本属性抽取方法、装置、计算机设备及存储介质 - Google Patents

基于概率图模型的文本属性抽取方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2021135477A1
WO2021135477A1 PCT/CN2020/119137 CN2020119137W WO2021135477A1 WO 2021135477 A1 WO2021135477 A1 WO 2021135477A1 CN 2020119137 W CN2020119137 W CN 2020119137W WO 2021135477 A1 WO2021135477 A1 WO 2021135477A1
Authority
WO
WIPO (PCT)
Prior art keywords
entity
text
output
vector
attributes
Prior art date
Application number
PCT/CN2020/119137
Other languages
English (en)
French (fr)
Inventor
程华东
李剑锋
汪伟
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021135477A1 publication Critical patent/WO2021135477A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • This application relates to the technical field of artificial intelligence intelligent decision-making, and in particular to a method, device, computer equipment, and storage medium for extracting text attributes based on a probability graph model.
  • Attribute extraction for text is different from relation extraction.
  • the difficulty of attribute extraction lies in not only identifying the attribute name of the entity but also identifying the attribute value of the entity.
  • the main attribute extraction methods include rule-based attribute extraction, statistical model-based attribute extraction and pattern-based attribute extraction.
  • rule-based attribute extraction is usually targeted at semi-structured data such as web pages and tables, which is not effective in processing unstructured data.
  • the inventor realizes that attribute extraction methods based on statistical models are often implemented by relation extraction methods.
  • the attribute value is regarded as another entity, and the attribute is regarded as the relationship between the entity and the entity.
  • This supervised attribute extraction method requires A large amount of corpus can't solve the problem of attribute sharing, and can't solve the process of distinguishing an entity name into multiple entities due to different attributes.
  • Pattern-based attribute extraction mostly uses a pattern discovery method based on dependency analysis. In the process of pattern discovery, the rich information around the entities in the pattern will be lost. At the same time, the extracted patterns will measure the compliance of the pattern through a scoring mechanism, which is extremely easy Circumstances that cause attribute extraction omission or attribute extraction error.
  • the embodiments of the present application provide a method, device, computer equipment and storage medium for text attribute extraction based on a probability graph model, aiming to solve the problem of rule-based attribute extraction, statistical model-based attribute extraction and pattern-based attribute extraction in the prior art. Extraction is a problem that the structure of the data to be extracted is limited, and the accuracy of the attribute extraction of the data is not high.
  • an embodiment of the present application provides a method for extracting text attributes based on a probabilistic graph model, which includes:
  • the pre-trained BERT neural network model and input the to-be-processed text into the BERT neural network model to perform operations to obtain a text representation output corresponding to the text to be processed; wherein, the text representation output includes multiple The vector representations corresponding to the words;
  • the entity type corresponding to the text representation output is called through the pre-stored entity embedding matrix and the pre-trained dynamic graph convolutional neural network, followed by recursion, vector splicing, feature fusion and necessary attribute extraction to obtain the necessary attributes in the entity And the start and end positions of the necessary attributes; and
  • the necessary attributes and the starting and ending positions of the necessary attributes in the entity are called by the pre-trained Bi-LSTM model, followed by entity representation vector extraction, vector splicing feature fusion and non-essential attribute extraction to obtain non-essential attributes in the entity And the start and end positions of non-essential attributes.
  • an embodiment of the present application provides a text attribute extraction device based on a probabilistic graph model, which includes:
  • the text receiving unit is used to receive the to-be-processed text uploaded by the client;
  • the text representation output acquisition unit is used to call the pre-trained BERT neural network model, input the to-be-processed text into the BERT neural network model for calculation, and obtain the text representation output corresponding to the to-be-processed text; wherein, The output of the text representation includes vector representations corresponding to multiple words;
  • An entity type recognition unit configured to call a pre-trained multi-task learning classification model, input the text representation output to the multi-task learning classification model for recognition, and obtain an entity type corresponding to the text representation output;
  • the necessary attribute extraction unit is used to perform recursion, vector splicing, feature fusion and necessary attribute extraction through the called pre-stored entity embedding matrix and the pre-trained dynamic graph convolutional neural network through the entity type corresponding to the text representation output , Get the necessary attributes in the entity and the starting and ending positions of the necessary attributes;
  • the non-essential attribute extraction unit is used to sequentially perform entity representation vector extraction, vector splicing feature fusion, and non-essential attribute extraction by calling the pre-trained Bi-LSTM model of the necessary attributes and the starting and ending positions of the necessary attributes in the entity , Get the non-essential attributes and the starting and ending positions of the non-essential attributes in the entity.
  • an embodiment of the present application provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and running on the processor, and the processor executes the computer The following steps are implemented during the program:
  • the pre-trained BERT neural network model and input the to-be-processed text into the BERT neural network model to perform operations to obtain a text representation output corresponding to the text to be processed; wherein, the text representation output includes multiple The vector representations corresponding to the words;
  • the entity type corresponding to the text representation output is called through the pre-stored entity embedding matrix and the pre-trained dynamic graph convolutional neural network, followed by recursion, vector splicing, feature fusion and necessary attribute extraction to obtain the necessary attributes in the entity And the start and end positions of the necessary attributes; and
  • the necessary attributes and the starting and ending positions of the necessary attributes in the entity are called by the pre-trained Bi-LSTM model, followed by entity representation vector extraction, vector splicing feature fusion and non-essential attribute extraction to obtain non-essential attributes in the entity And the start and end positions of non-essential attributes.
  • the embodiments of the present application also provide a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, which when executed by a processor causes the processor to perform the following operations :
  • the pre-trained BERT neural network model and input the to-be-processed text into the BERT neural network model to perform operations to obtain a text representation output corresponding to the text to be processed; wherein, the text representation output includes multiple The vector representations corresponding to the words;
  • the entity type corresponding to the text representation output is called through the pre-stored entity embedding matrix and the pre-trained dynamic graph convolutional neural network, followed by recursion, vector splicing, feature fusion and necessary attribute extraction to obtain the necessary attributes in the entity And the start and end positions of the necessary attributes; and
  • the necessary attributes and the starting and ending positions of the necessary attributes in the entity are called by the pre-trained Bi-LSTM model, followed by entity representation vector extraction, vector splicing feature fusion and non-essential attribute extraction to obtain non-essential attributes in the entity And the start and end positions of non-essential attributes.
  • the embodiments of the application provide a method, device, computer equipment, and storage medium for extracting text attributes based on a probabilistic graph model, including inputting the received text to be processed into the BERT neural network model to obtain the corresponding text representation output; outputting the text representation Input to the multi-task learning classification model to obtain the corresponding entity type; sequentially perform recursive, vector splicing, feature fusion and necessary attribute extraction on the entity type to obtain the necessary attributes in the entity and the starting and ending positions of the necessary attributes; the necessary attributes in the entity
  • the starting and ending positions of attributes and necessary attributes are extracted by entity representation vector, vector splicing feature fusion, and non-essential attribute extraction in order to obtain the starting and ending positions of non-essential attributes and non-essential attributes in the entity.
  • the accuracy of data attribute extraction is improved, and the data format of the text to be processed is unlimited, and any structured data or unstructured data can be input.
  • FIG. 1 is a schematic diagram of an application scenario of a text attribute extraction method based on a probabilistic graph model provided by an embodiment of the application;
  • FIG. 2 is a schematic flowchart of a text attribute extraction method based on a probability graph model provided by an embodiment of the application;
  • FIG. 3 is a schematic block diagram of a text attribute extraction device based on a probability graph model provided by an embodiment of the application;
  • Fig. 4 is a schematic block diagram of a computer device provided by an embodiment of the application.
  • Figure 1 is a schematic diagram of an application scenario of a text attribute extraction method based on a probabilistic graph model provided by an embodiment of the application
  • Figure 2 is a schematic diagram of a text attribute extraction method based on a probabilistic graph model provided by an embodiment of the application Schematic flow chart.
  • the method for extracting text attributes based on the probability graph model is applied to the server, and the method is executed by the application software installed in the server.
  • the method includes steps S110 to S150.
  • S110 Receive the to-be-processed text uploaded by the user terminal.
  • the user when there is a text to be processed in the user terminal that requires text attribute extraction, the user can operate the user terminal (the user terminal is a smart terminal such as a smart phone or tablet computer used by the user) to upload the to-be-processed text to the server ,
  • the user terminal is a smart terminal such as a smart phone or tablet computer used by the user
  • the user terminal is a smart terminal such as a smart phone or tablet computer used by the user
  • the user terminal is a smart terminal such as a smart phone or tablet computer used by the user
  • Attribute extraction is performed on the text to be processed through the server.
  • the text to be processed is "double breast glands are slightly thickened, light spots are slightly dense, glandular echo distribution is uneven, and the structure is slightly disordered. There are several hypoechoic nodules in the right breast.
  • the larger size is about 19mm14mm30mm (inner Upper), 20mm9mm (outer lower), the boundary is not clear, the shape is not regular; there are several hypoechoic nodules in the left breast, the larger size is about 8mm4mm (outer upper), the boundary is clear.
  • CDFI no obvious abnormal blood flow signal .”
  • the full name of BERT in the BERT neural network model is Bidirectional Encoder Representations from Transformers, which is a bidirectional language model based on Transformer (Transformer model, that is, translation model).
  • Transformer Transformer model, that is, translation model.
  • the BERT neural network model can extract the word vector representation of the text more accurately.
  • step S120 includes:
  • Each word in the word segmentation set is input to the BERT neural network model for operation, and the vector representation corresponding to each word in the word segmentation set is obtained.
  • the vector representation of each word is combined to obtain the Process the text representation output corresponding to the text.
  • the text representation output of the BERT neural network model is essentially a combination of the vector representations of each word in the text
  • the text to be processed can be split by word to obtain a composition composed of multiple words.
  • each word in the character set is input to the BERT neural network model for operation to obtain a vector representation corresponding to each character in the character set, for example, char-i represents the i-th character Vector representation
  • the text representation output is a two-dimensional matrix [char-1, char-2, char-3,..., char-n].
  • the multi-task learning classification model is the Multi-Classification model, which is used for multi-task learning to determine which entity types are included in the text to be processed. For example, when the two-dimensional matrix [char-1, char-2, char-3,..., char-n] corresponding to the text representation output is used as the input of the multi-task learning classification model, an output vector is obtained [1 1]; According to the output vector [1 1], the entity type corresponding to the text representation output can be obtained statistically.
  • step S130 includes:
  • the text representation output is input to the multi-task learning classification model for recognition, and the entity recognition output vector corresponding to the text representation output is obtained, and the number of entities is statistically obtained according to the vector value of the entity recognition output vector. Count to get the entity types included according to the number of entities.
  • an output vector [1 1] is created.
  • the text representation output corresponds to two entity types.
  • “double breasts” and “double breasts nodules” are the two entity types corresponding to the text representation output.
  • the first "1" in the output vector [1 1] indicates that it contains double breasts, and the second A "1" means that it contains double breast nodules.
  • the entity type corresponding to the text representation output is called through the pre-stored entity embedding matrix and the pre-trained dynamic graph convolutional neural network, followed by recursion, vector splicing, feature fusion and necessary attribute extraction to obtain the entity in the entity.
  • the necessary attributes and the starting and ending positions of the necessary attributes are called through the pre-stored entity embedding matrix and the pre-trained dynamic graph convolutional neural network, followed by recursion, vector splicing, feature fusion and necessary attribute extraction to obtain the entity in the entity.
  • an entity type is selected from the entities identified by the entity type, the embedding representation of the entity is obtained through the Entity Embedding matrix (Entity Embedding matrix), and the embedding representation of the entity is spliced to the BERT neural network model
  • the output text representation output is used as the input of the dynamic graph convolutional neural network after a Transformer.
  • the dynamic graph convolutional neural network passes the input of the Transformer through the entity information corresponding to the four-layer expanded convolution model layer and then connects it with a two-pointer sequence to label the entity information of the model. It can be seen that the dynamic graph convolutional neural network mainly determines the unique entity by learning the entity's label attributes through the input entity type information.
  • step S140 includes:
  • the pre-trained dynamic graph convolutional neural network is called, and the fusion representation output is input to the dynamic graph convolutional neural network for calculation to obtain the necessary attributes in the entity and the starting and ending positions of the necessary attributes.
  • the entity type characterization output corresponding to the entity type, that is, obtain the entity embedding matrix in the figure
  • the value of one line () is because there are only two types of entities to be processed, so the matrix is two lines, the first line represents the characterization output of double breasts, and the second line represents the characterization output of double breast nodules; if double breast nodules are selected Section is the second row of the matrix, denoted as entity_type_vector.
  • the entity_type_vector is spliced to each word in the two-dimensional matrix [char-1, char-2, char-3,..., char-n] corresponding to the text representation output, and the vector representation of the i-th word is from char- 1 becomes [char-i, entity_type_vector].
  • the splicing of entity_type_vector and [char-1, char-2, char-3,..., char-n] is not cumulative, the purpose is to integrate the characterization information of the character with the entity type information to be processed, so that the following The learning tasks on the first floor become clear.
  • the splicing characterization output is feature fused to obtain the fusion characterization output, in order to perform fusion learning between features and learn the influence between each feature.
  • the splicing characterization output [[char-1,entity_type_vector],[char-2,entity_type_vector],[char-3,entity_type_vector],...,[char-n,entity_type_vector]]
  • the fusion characterization obtained Each word in the output is marked as t-vector-i.
  • a pre-trained Transformer network is called, and the splicing characterization output is input to the Transformer network for feature fusion, and a fusion characterization output is obtained.
  • the fusion characterization output [t-vector-1,t-vector-2,...,t-vector-n] is input to the dynamic graph convolutional neural network (that is, the DGCNN model) to extract the necessary feature information and the starting point of the necessary attributes
  • the starting and ending positions of the necessary attributes include the starting position array of the necessary attributes and the ending position array of the necessary attributes.
  • the starting position array of the necessary attributes [0, 1, 0, 1, 0, 0,..., 1];
  • the array length of the necessary attribute starting position array or the necessary attribute starting position array is equal to the length of the text to be processed, and the position of the necessary attribute starting position array with a value of 1 is the starting position of the necessary attribute (the position in the entire array ), after knowing the location of the necessary attributes, the necessary attributes can be extracted by locating from the text.
  • the RNN structure is only used when calculating the entity information representation, but compared to the text
  • the necessary information of the length entity is still very short, so the training and prediction efficiency of the model is higher than that of the commonly used information extraction model.
  • the Transformer structure and the CNN structure can be trained in parallel on the GPU, unlike the serial mechanism of the RNN, so the speed will be very fast.
  • the speed of the RNN depends on the length of the text. Because the entity information is very short, the RNN used in the model It will be very efficient.
  • the necessary attributes and the starting and ending positions of the necessary attributes in the entity are called by the pre-trained Bi-LSTM model, followed by entity representation vector extraction, vector splicing feature fusion, and unnecessary attribute extraction to obtain non-essential attributes in the entity.
  • entity representation vector extraction vector splicing feature fusion
  • unnecessary attribute extraction to obtain non-essential attributes in the entity.
  • Bi-LSTM is the abbreviation of Bi-directional Long Short-Term Memory, which is a combination of forward LSTM and backward LSTM
  • Bi-LSTM is the abbreviation of Bi-directional Long Short-Term Memory, which is a combination of forward LSTM and backward LSTM
  • the necessary attributes and the starting and ending positions of the necessary attributes in the entity can be input into the Bi-LSTM model for calculation, and the vector splicing feature fusion is performed And non-essential attributes are extracted, and the non-essential attributes and the starting and ending positions of the non-essential attributes in the entity are obtained.
  • the Bi-LSTM model the characterization information of the entity can be accurately identified, and unnecessary attributes can be screened out.
  • step S150 includes:
  • the dynamic graph convolutional neural network is called, and the entity fusion characterization output is input to the dynamic graph convolutional neural network for calculation to obtain the non-essential attributes and the starting and ending positions of the non-essential attributes in the entity.
  • step S150 the difference between extracting the necessary attributes from the text representation output in step S140 is that in step S150, the necessary attributes in the entity and the starting and ending positions of the necessary attributes are used as the Bi-LSTM model to obtain the entity splicing. Characterizing the output, and then referring to the feature fusion in the specific embodiment of step S140 and input to the dynamic graph convolutional neural network for operation are exactly the same, except that the initial processing method of the first step is different.
  • the entity positioning model used encodes the entity type information into the input information, that is, binding the identified attributes to the entity type; similarly, when using the attribute extraction model to extract non-essential attributes, Encode the entity information into the input information, and bind the entity and entity type to the attribute extraction.
  • the method realizes the improvement of the accuracy of data attribute extraction, and there is no restriction on the data format of the text to be processed, and any structured data or unstructured data can be input.
  • An embodiment of the present application also provides a text attribute extraction device based on a probability graph model, and the text attribute extraction device based on a probability graph model is used to implement any embodiment of the aforementioned text attribute extraction method based on the probability graph model.
  • FIG. 3 is a schematic block diagram of a text attribute extraction device based on a probability graph model provided by an embodiment of the present application.
  • the text attribute extraction device 100 based on the probability graph model can be configured in a server.
  • the text attribute extraction device 100 based on the probability graph model includes: a text receiving unit 110, a text representation output acquisition unit 120, an entity type recognition unit 130, an essential attribute extraction unit 140, and an unnecessary attribute extraction unit 150.
  • the text receiving unit 110 receives the to-be-processed text uploaded by the user terminal.
  • the user when there is a text to be processed in the user terminal that requires text attribute extraction, the user can operate the user terminal (the user terminal is a smart terminal such as a smart phone or tablet computer used by the user) to upload the to-be-processed text to the server ,
  • the user terminal is a smart terminal such as a smart phone or tablet computer used by the user
  • the user terminal is a smart terminal such as a smart phone or tablet computer used by the user
  • the user terminal is a smart terminal such as a smart phone or tablet computer used by the user
  • Attribute extraction is performed on the text to be processed through the server.
  • the text to be processed is "double breast glands are slightly thickened, light spots are slightly dense, glandular echo distribution is uneven, and the structure is slightly disordered. There are several hypoechoic nodules in the right breast.
  • the larger size is about 19mm14mm30mm (inner Upper), 20mm9mm (outer lower), the boundary is not clear, the shape is not regular; there are several hypoechoic nodules in the left breast, the larger size is about 8mm4mm (outer upper), the boundary is clear.
  • CDFI no obvious abnormal blood flow signal .”
  • the text representation output obtaining unit 120 is configured to call a pre-trained BERT neural network model, input the to-be-processed text into the BERT neural network model for calculation, and obtain a text representation output corresponding to the to-be-processed text; wherein, The text representation output includes vector representations corresponding to multiple characters.
  • the full name of BERT in the BERT neural network model is Bidirectional Encoder Representations from Transformers, which is a bidirectional language model based on Transformer (Transformer model, that is, translation model).
  • Transformer Transformer model, that is, translation model.
  • the BERT neural network model can extract the word vector representation of the text more accurately.
  • the text representation output obtaining unit 120 includes:
  • a text splitting unit used to split the to-be-processed text by word to obtain a word set
  • the word vector characterization acquiring unit is used to input each word in the word segmentation set to the BERT neural network model for operation to obtain a vector characterization corresponding to each word in the word segmentation set.
  • the vector characterization combination obtains a text characterization output corresponding to the text to be processed.
  • the text representation output output by the BERT neural network model is essentially a combination of the vector representations of each word in the text
  • the text to be processed can be split by word to obtain a composition composed of multiple words.
  • each word in the character set is input to the BERT neural network model for operation to obtain a vector representation corresponding to each character in the character set, for example, char-i represents the i-th character Vector representation
  • the text representation output is a two-dimensional matrix [char-1, char-2, char-3,..., char-n].
  • the entity type recognition unit 130 is configured to call a pre-trained multi-task learning classification model, input the text representation output to the multi-task learning classification model for recognition, and obtain an entity type corresponding to the text representation output.
  • the multi-task learning classification model is the Multi-Classification model, which is used for multi-task learning to determine which entity types are included in the text to be processed. For example, when the two-dimensional matrix [char-1, char-2, char-3,..., char-n] corresponding to the text representation output is used as the input of the multi-task learning classification model, an output vector is obtained [1 1]; According to the output vector [1 1], the entity type corresponding to the text representation output can be obtained statistically.
  • the entity type identification unit 130 is further configured to:
  • the text representation output is input to the multi-task learning classification model for recognition, and the entity recognition output vector corresponding to the text representation output is obtained, and the number of entities is statistically obtained according to the vector value of the entity recognition output vector. Count to get the entity types included according to the number of entities.
  • the text representation output corresponds to two entity types.
  • “double breasts” and “double breasts nodules” are the two entity types corresponding to the text representation output.
  • the first "1" in the output vector [1 1] indicates that it contains double breasts, and the second A "1" means that it contains double breast nodules.
  • the necessary attribute extraction unit 140 is used to perform recursion, vector splicing, feature fusion, and necessary attributes through the called pre-stored entity embedding matrix and the pre-trained dynamic graph convolutional neural network through the entity type corresponding to the text representation output Extract, get the necessary attributes in the entity and the starting and ending positions of the necessary attributes.
  • an entity type is selected from the entities identified by the entity type, the embedding representation of the entity is obtained through the Entity Embedding matrix (Entity Embedding matrix), and the embedding representation of the entity is spliced to the BERT neural network model
  • the output text representation output is used as the input of the dynamic graph convolutional neural network after a Transformer.
  • the dynamic graph convolutional neural network passes the input of the Transformer through the entity information corresponding to the four-layer expanded convolution model layer and then connects it with a two-pointer sequence to label the entity information of the model. It can be seen that the dynamic graph convolutional neural network mainly determines the unique entity by learning the entity's label attributes through the input entity type information.
  • the necessary attribute extraction unit 140 includes:
  • a recursive processing unit for the purpose of public welfare to perform recursive processing on the entity type corresponding to the text representation output through the called pre-stored entity embedding matrix to obtain the entity type representation output;
  • the first splicing unit is configured to splice the entity type characterization output to the vector characterization corresponding to each word in the text characterization output to obtain the splicing characterization output;
  • the first fusion unit is used to perform feature fusion on the splicing characterization output to obtain a fusion characterization output
  • the first arithmetic unit is used to call the pre-trained dynamic graph convolutional neural network, and input the fusion representation output to the dynamic graph convolutional neural network for calculation to obtain the necessary attributes and the starting point of the necessary attributes in the entity Halt location.
  • the entity type characterization output corresponding to the entity type, that is, obtain the entity embedding matrix in the figure
  • the value of one line () is because there are only two types of entities to be processed, so the matrix is two lines, the first line represents the characterization output of double breasts, and the second line represents the characterization output of double breast nodules; if double breast nodules are selected Section is the second row of the matrix, denoted as entity_type_vector.
  • the entity_type_vector is spliced to each word in the two-dimensional matrix [char-1, char-2, char-3,..., char-n] corresponding to the text representation output, and the vector representation of the i-th word is from char- 1 becomes [char-i, entity_type_vector].
  • the splicing of entity_type_vector and [char-1, char-2, char-3,..., char-n] is not cumulative, the purpose is to integrate the characterization information of the character with the entity type information to be processed, so that the following The learning tasks on the first floor become clear.
  • the splicing characterization output is feature fused to obtain the fusion characterization output, in order to perform fusion learning between features and learn the influence between each feature.
  • the splicing characterization output [[char-1,entity_type_vector],[char-2,entity_type_vector],[char-3,entity_type_vector],...,[char-n,entity_type_vector]]
  • the fusion characterization obtained Each word in the output is marked as t-vector-i.
  • a pre-trained Transformer network is called, and the splicing characterization output is input to the Transformer network for feature fusion, and a fusion characterization output is obtained.
  • the fusion characterization output [t-vector-1,t-vector-2,...,t-vector-n] is input to the dynamic graph convolutional neural network (that is, the DGCNN model) to extract the necessary feature information and the starting point of the necessary attributes
  • the starting and ending positions of the necessary attributes include the starting position array of the necessary attributes and the ending position array of the necessary attributes.
  • the starting position array of the necessary attributes [0, 1, 0, 1, 0, 0,..., 1];
  • the array length of the necessary attribute starting position array or the necessary attribute starting position array is equal to the length of the text to be processed, and the position of the necessary attribute starting position array with a value of 1 is the starting position of the necessary attribute (the position in the entire array ), after knowing the location of the necessary attributes, the necessary attributes can be extracted by locating from the text.
  • the RNN structure is only used when calculating the entity information representation, but compared to the text
  • the necessary information of the length entity is still very short, so the training and prediction efficiency of the model is higher than that of the commonly used information extraction model.
  • the Transformer structure and the CNN structure can be trained in parallel on the GPU, unlike the serial mechanism of the RNN, so the speed will be very fast.
  • the speed of the RNN depends on the length of the text. Because the entity information is very short, the RNN used in the model It will be very efficient.
  • the non-essential attribute extraction unit 150 is configured to sequentially perform entity representation vector extraction, vector splicing feature fusion, and non-essential attributes by calling the pre-trained Bi-LSTM model of the necessary attributes and the starting and ending positions of the necessary attributes in the entity. Extract, get the non-essential attributes and the starting and ending positions of the non-essential attributes in the entity.
  • Bi-LSTM is the abbreviation of Bi-directional Long Short-Term Memory, which is a combination of forward LSTM and backward LSTM
  • Bi-LSTM is the abbreviation of Bi-directional Long Short-Term Memory, which is a combination of forward LSTM and backward LSTM
  • the necessary attributes and the starting and ending positions of the necessary attributes in the entity can be input into the Bi-LSTM model for calculation, and the vector splicing feature fusion is performed And non-essential attributes are extracted, and the non-essential attributes and the starting and ending positions of the non-essential attributes in the entity are obtained.
  • the Bi-LSTM model the characterization information of the entity can be accurately identified, and unnecessary attributes can be screened out.
  • the non-essential attribute extraction unit 150 includes:
  • An entity representation vector acquiring unit configured to call a pre-trained Bi-LSTM model, and input the necessary attributes in the entity and the start and end positions of the necessary attributes into the Bi-LSTM model for calculation to obtain an entity representation vector;
  • the second splicing unit is used to splice the entity representation vector to the vector representation corresponding to each word in the text representation output to obtain the entity splicing representation output;
  • the second fusion unit is used to perform feature fusion on the entity splicing characterization output to obtain the entity fusion characterization output;
  • the second arithmetic unit is used to call the dynamic graph convolutional neural network, and input the entity fusion representation output to the dynamic graph convolutional neural network for calculation to obtain non-essential attributes and non-essential attributes in the entity Start and end position.
  • the difference from extracting the necessary attributes from the text representation output is that the non-essential attribute extraction unit 150 uses the necessary attributes in the entity and the starting and ending positions of the necessary attributes as the Bi-LSTM model to obtain Entity splicing and characterization output, and then referring to the specific embodiment of the necessary attribute extraction unit 140 for feature fusion and input to the dynamic graph convolutional neural network for operation are exactly the same, except that the initial processing method of the first step is different.
  • the entity positioning model used encodes the entity type information into the input information, that is, binding the identified attributes to the entity type; similarly, when using the attribute extraction model to extract non-essential attributes, Encode the entity information into the input information, and bind the entity and entity type to the attribute extraction.
  • the device realizes the improvement of the accuracy of data attribute extraction, and there is no restriction on the data format of the text to be processed, and any structured data or unstructured data can be input.
  • the above-mentioned text attribute extraction device based on the probability graph model can be implemented in the form of a computer program, and the computer program can be run on a computer device as shown in FIG. 4.
  • FIG. 4 is a schematic block diagram of a computer device according to an embodiment of the present application.
  • the computer device 500 is a server, and the server may be an independent server or a server cluster composed of multiple servers.
  • the computer device 500 includes a processor 502, a memory, and a network interface 505 connected through a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504.
  • the non-volatile storage medium 503 can store an operating system 5031 and a computer program 5032.
  • the processor 502 can execute a text attribute extraction method based on a probability graph model.
  • the processor 502 is used to provide calculation and control capabilities, and support the operation of the entire computer device 500.
  • the internal memory 504 provides an environment for the operation of the computer program 5032 in the non-volatile storage medium 503.
  • the processor 502 can execute a text attribute extraction method based on a probability graph model.
  • the network interface 505 is used for network communication, such as providing data information transmission.
  • the structure shown in FIG. 4 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device 500 to which the solution of the present application is applied.
  • the specific computer device 500 may include more or fewer components than shown in the figure, or combine certain components, or have a different component arrangement.
  • the processor 502 is configured to run a computer program 5032 stored in a memory to implement the method for extracting text attributes based on the probability graph model disclosed in the embodiment of the present application.
  • the embodiment of the computer device shown in FIG. 4 does not constitute a limitation on the specific configuration of the computer device.
  • the computer device may include more or less components than those shown in the figure. Or some parts are combined, or different parts are arranged.
  • the computer device may only include a memory and a processor. In such an embodiment, the structures and functions of the memory and the processor are consistent with the embodiment shown in FIG. 4, and will not be repeated here.
  • the processor 502 may be a central processing unit (Central Processing Unit, CPU), and the processor 502 may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSPs), Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor.
  • a computer-readable storage medium In another embodiment of the present application, a computer-readable storage medium is provided.
  • the computer-readable storage medium may be non-volatile or volatile.
  • the computer-readable storage medium stores a computer program, where the computer program is executed by a processor to implement the method for extracting text attributes based on the probability graph model disclosed in the embodiments of the present application.
  • the disclosed equipment, device, and method may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods, or the units with the same function may be combined into one. Units, for example, multiple units or components can be combined or integrated into another system, or some features can be omitted or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may also be electrical, mechanical or other forms of connection.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments of the present application.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a storage medium.
  • the technical solution of this application is essentially or the part that contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium. It includes several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method described in each embodiment of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), magnetic disk or optical disk and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种基于概率图模型的文本属性抽取方法、装置、计算机设备及存储介质,涉及人工智能的神经网络技术,所述方法包括将所接收待处理文本输入至BERT神经网络模型得到对应的文本表征输出;将文本表征输出输入至多任务学习分类模型,得到对应的实体类型;将实体类型依次进行递归、向量拼接、特征融合和必要属性抽取,得到实体中的必要属性和必要属性的起始终止位置;将实体中的必要属性和必要属性的起始终止位置依次进行实体表征向量提取、向量拼接特征融合和非必要属性抽取,得到实体中的非必要属性和非必要属性的起始终止位置。实现了对数据的属性抽取准确率的提升,而且对待处理文本的数据格式无限制,可以输入任何结构化的数据或者非结构化的数据。

Description

基于概率图模型的文本属性抽取方法、装置、计算机设备及存储介质
本申请要求于2020年7月31日提交中国专利局、申请号为202010761083.1,发明名称为“基于概率图模型的文本属性抽取方法、装置及计算机设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能的智能决策技术领域,尤其涉及一种基于概率图模型的文本属性抽取方法、装置、计算机设备及存储介质。
背景技术
针对文本的属性抽取不同于关系抽取,属性抽取的难点在于不仅要识别实体的属性名还要识别实体的属性值。目前主要的属性抽取方法有基于规则的属性抽取、基于统计模型的属性抽取和基于模式的属性抽取。
其中,基于规则的属性抽取,面向的抽取对象通常是网页、表格等半结构化的数据,对于非结构化的数据处理效果不佳。
发明人意识到基于统计模型的属性抽取方法常采用关系抽取的方法去实现,将属性值当作另一种实体,属性当作实体与实体之间的关系,这种有监督的属性抽取方法需要大量的语料,也无法解决属性共享问题,同时无法解决一个实体名因为属性的不同而区分为多种实体的过程。
基于模式的属性抽取大多采用一种基于依赖分析的模式发现方法,模式发现的过程中会丢失模式中实体周围的丰富信息,同时抽取的模式会通过打分机制来衡量模式的合规性,极容易造成属性抽取遗漏或者属性抽取错误的情况。
发明内容
本申请实施例提供了一种基于概率图模型的文本属性抽取方法、装置、计算机设备及存储介质,旨在解决现有技术中基于规则的属性抽取、基于统计模型的属性抽取和基于模式的属性抽取对待抽取的数据结构有限定,且对数据的属性抽取准确率不高的问题。
第一方面,本申请实施例提供了一种基于概率图模型的文本属性抽取方法,其包括:
接收用户端上传的待处理文本;
调用预先训练的BERT神经网络模型,将所述待处理文本输入至所述BERT神经网络模型进行运算,得到与所述待处理文本对应的文本表征输出;其中,所述文本表征输出中包括多个字分别对应的向量表征;
调用预先训练的多任务学习分类模型,将所述文本表征输出输入至所述多任务学习分类模型进行识别,得到与所述文本表征输出对应的实体类型;
将所述文本表征输出对应的实体类型通过所调用预先存储的实体嵌入矩阵和预先训练的动态图卷积神经网络,依次进行递归、向量拼接、特征融合和必要属性抽取,得到实体中的必要属性和必要属性的起始终止位置;以及
将所述实体中的必要属性和必要属性的起始终止位置通过调用预先训练的Bi-LSTM模型,依次进行实体表征向量提取、向量拼接特征融合和非必要属性抽取,得到实体中的非必要属性和非必要属性的起始终止位置。
第二方面,本申请实施例提供了一种基于概率图模型的文本属性抽取装置,其包括:
文本接收单元,用于接收用户端上传的待处理文本;
文本表征输出获取单元,用于调用预先训练的BERT神经网络模型,将所述待处理文本输入至所述BERT神经网络模型进行运算,得到与所述待处理文本对应的文本表征输出;其中,所述文本表征输出中包括多个字分别对应的向量表征;
实体类型识别单元,用于调用预先训练的多任务学习分类模型,将所述文本表征输出输 入至所述多任务学习分类模型进行识别,得到与所述文本表征输出对应的实体类型;
必要属性抽取单元,用于将所述文本表征输出对应的实体类型通过所调用预先存储的实体嵌入矩阵和预先训练的动态图卷积神经网络,依次进行递归、向量拼接、特征融合和必要属性抽取,得到实体中的必要属性和必要属性的起始终止位置;以及
非必要属性抽取单元,用于将所述实体中的必要属性和必要属性的起始终止位置通过调用预先训练的Bi-LSTM模型,依次进行实体表征向量提取、向量拼接特征融合和非必要属性抽取,得到实体中的非必要属性和非必要属性的起始终止位置。
第三方面,本申请实施例又提供了一种计算机设备,其包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现以下步骤:
接收用户端上传的待处理文本;
调用预先训练的BERT神经网络模型,将所述待处理文本输入至所述BERT神经网络模型进行运算,得到与所述待处理文本对应的文本表征输出;其中,所述文本表征输出中包括多个字分别对应的向量表征;
调用预先训练的多任务学习分类模型,将所述文本表征输出输入至所述多任务学习分类模型进行识别,得到与所述文本表征输出对应的实体类型;
将所述文本表征输出对应的实体类型通过所调用预先存储的实体嵌入矩阵和预先训练的动态图卷积神经网络,依次进行递归、向量拼接、特征融合和必要属性抽取,得到实体中的必要属性和必要属性的起始终止位置;以及
将所述实体中的必要属性和必要属性的起始终止位置通过调用预先训练的Bi-LSTM模型,依次进行实体表征向量提取、向量拼接特征融合和非必要属性抽取,得到实体中的非必要属性和非必要属性的起始终止位置。
第四方面,本申请实施例还提供了一种计算机可读存储介质,其中所述计算机可读存储介质存储有计算机程序,所述计算机程序当被处理器执行时使所述处理器执行以下操作:
接收用户端上传的待处理文本;
调用预先训练的BERT神经网络模型,将所述待处理文本输入至所述BERT神经网络模型进行运算,得到与所述待处理文本对应的文本表征输出;其中,所述文本表征输出中包括多个字分别对应的向量表征;
调用预先训练的多任务学习分类模型,将所述文本表征输出输入至所述多任务学习分类模型进行识别,得到与所述文本表征输出对应的实体类型;
将所述文本表征输出对应的实体类型通过所调用预先存储的实体嵌入矩阵和预先训练的动态图卷积神经网络,依次进行递归、向量拼接、特征融合和必要属性抽取,得到实体中的必要属性和必要属性的起始终止位置;以及
将所述实体中的必要属性和必要属性的起始终止位置通过调用预先训练的Bi-LSTM模型,依次进行实体表征向量提取、向量拼接特征融合和非必要属性抽取,得到实体中的非必要属性和非必要属性的起始终止位置。
本申请实施例提供了一种基于概率图模型的文本属性抽取方法、装置、计算机设备及存储介质,包括将所接收待处理文本输入至BERT神经网络模型得到对应的文本表征输出;将文本表征输出输入至多任务学习分类模型,得到对应的实体类型;将实体类型依次进行递归、向量拼接、特征融合和必要属性抽取,得到实体中的必要属性和必要属性的起始终止位置;将实体中的必要属性和必要属性的起始终止位置依次进行实体表征向量提取、向量拼接特征融合和非必要属性抽取,得到实体中的非必要属性和非必要属性的起始终止位置。实现了对数据的属性抽取准确率的提升,而且对待处理文本的数据格式无限制,可以输入任何结构化的数据或者非结构化的数据。
附图说明
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的基于概率图模型的文本属性抽取方法的应用场景示意图;
图2为本申请实施例提供的基于概率图模型的文本属性抽取方法的流程示意图;
图3为本申请实施例提供的基于概率图模型的文本属性抽取装置的示意性框图;
图4为本申请实施例提供的计算机设备的示意性框图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
应当理解,当在本说明书和所附权利要求书中使用时,术语“包括”和“包含”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。
还应当理解,在此本申请说明书中所使用的术语仅仅是出于描述特定实施例的目的而并不意在限制本申请。如在本申请说明书和所附权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。
还应当进一步理解,在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。
请参阅图1和图2,图1为本申请实施例提供的基于概率图模型的文本属性抽取方法的应用场景示意图;图2为本申请实施例提供的基于概率图模型的文本属性抽取方法的流程示意图,该基于概率图模型的文本属性抽取方法应用于服务器中,该方法通过安装于服务器中的应用软件进行执行。
如图2所示,该方法包括步骤S110~S150。
S110、接收用户端上传的待处理文本。
在本实施例中,当用户端中有待处理文本需进行文本属性抽取时,可以由用户操作用户端(用户端为用户所使用的智能手机、平板电脑等智能终端)将待处理文本上传至服务器,其中对待处理文本的数据格式无限制,可以输入任何结构化的数据或者非结构化的数据。通过服务器对待处理文本进行属性抽取。例如,所述待处理文本为"双乳腺体略增厚,光点略密,腺体回声分布不均匀,结构略紊乱,右乳见数枚低回声结节,较大的大小约19mm14mm30mm(内上)、20mm9mm(外下),边界欠清,形态欠规则;左乳见数枚低回声结节,较大的大小约8mm4mm(外上),边界清。CDFI:未见明显异常血流信号。"
S120、调用预先训练的BERT神经网络模型,将所述待处理文本输入至所述BERT神经网络模型进行运算,得到与所述待处理文本对应的文本表征输出;其中,所述文本表征输出中包括多个字分别对应的向量表征。
在本实施例中,BERT神经网络模型中BERT的全称是Bidirectional Encoder Representations from Transformers,是一种基于Transformer的双向语言模型(Transformer模型即翻译模型)。相较于Word2Vec模型,BERT神经网络模型能更准确的提取文本的字向量表征。
在一实施例中,步骤S120包括:
将所述待处理文本按字拆分得到分字集合;
将所述分字集合中每一字输入至所述BERT神经网络模型进行运算,得到与所述分字集合中每一字对应的向量表征,由每一字的向量表征组合得到与所述待处理文本对应的文本表征输出。
在本实施例中,由于BERT神经网络模型输出的文本表征输出,本质上是文本中每个字的向量表征的组合,此时可以将待处理文本按字拆分得到了由多个字组成的分字集合后,将分字集合中每一字输入至所述BERT神经网络模型进行运算,得到与所述分字集合中每一字对应的向量表征,例如char-i表示第i个字的向量表征,则文本表征输出为一个二维矩阵[char-1,char-2,char-3,…,char-n]。
S130、调用预先训练的多任务学习分类模型,将所述文本表征输出输入至所述多任务学习分类模型进行识别,得到与所述文本表征输出对应的实体类型。
在本实施例中,多任务学习分类模型即Multi-Classification模型,其用于多任务学习以判断待处理文本中包含哪几种实体类型。例如,当将文本表征输出对应的二维矩阵[char-1,char-2,char-3,…,char-n]作为多任务学习分类模型的输入进行运算时,得到了一个输出向量[1 1];根据输出向量[1 1]即可统计获取所述文本表征输出对应的实体类型。
在一实施例中,步骤S130包括:
所述文本表征输出输入至所述多任务学习分类模型进行识别,得到与所述文本表征输出对应的实体识别输出向量,根据所述实体识别输出向量中取值为1的向量值统计获取实体个数,以根据实体个数对应得到包括的实体类型。
在本实施例中,例如上述当将文本表征输出对应的二维矩阵[char-1,char-2,char-3,…,char-n]作为多任务学习分类模型的输入进行运算时,得到了一个输出向量[1 1],该输出向量中有2个取值为1的向量值,此时可以判定所述文本表征输出对应的是2个实体类型。例如“双乳腺体”和“双乳结节”即是与所述文本表征输出对应的2个实体类型,其中输出向量[1 1]中第一个“1”表示含有双乳腺体,第二个“1”表示含有双乳结节。通过多任务学习分类模型,能更准确的对实体类型进行识别。
S140、将所述文本表征输出对应的实体类型通过所调用预先存储的实体嵌入矩阵和预先训练的动态图卷积神经网络,依次进行递归、向量拼接、特征融合和必要属性抽取,得到实体中的必要属性和必要属性的起始终止位置。
在本实施例中,从实体类型识别出的实体中选择一个实体类型,经过Entity Embedding矩阵(Entity Embedding矩阵即实体嵌入矩阵)获取该实体的嵌入表征,将实体的嵌入表征拼接到BERT神经网络模型输出的文本表征输出中经过一次Transformer后作为动态图卷积神经网络的输入。之后动态图卷积神经网络将Transformer的输入经过4层膨胀卷积模型层对应得到的实体信息之后接上一个双指针序列标注模型学的实体信息。可见动态图卷积神经网络主要是通过输入的实体类型信息,学习到实体的标记属性来确定唯一实体。
在一实施例中,步骤S140包括:
将所述文本表征输出对应的实体类型通过所调用预先存储的实体嵌入矩阵进行递归处理,得到实体类型表征输出;
将所述实体类型表征输出拼接至所述文本表征输出中每个字对应的向量表征,得到拼接表征输出;
将所述拼接表征输出进行特征融合,得到融合表征输出;
调用预先训练的动态图卷积神经网络,将所述融合表征输出输入至所述动态图卷积神经网络进行运算,以得到实体中的必要属性和必要属性的起始终止位置。
在本实施例中,例如当选择“双乳腺体”或“双乳结节”中任意一个实体类型进行处理时,先获取该实体类型对应的实体类型表征输出,即获取图中的实体嵌入矩阵一行的值()因为所要处理的实体类型只有2种,所以矩阵是2行的,第一行表示双乳腺体的表征输出,第二行表示双乳结节的表征输;假如选择双乳结节即得到矩阵的第二行,记为entity_type_vector。
此时将entity_type_vector拼接到文本表征输出对应的二维矩阵[char-1,char-2,char-3,…,char-n]中的每一个字上,第i个字的向量表征从char-1变成了[char-i,entity_type_vector]。此处进行entity_type_vector与[char-1,char-2,char-3,…,char-n]的拼接并不是累加,其目的是 为了让字的表征信息融合了所要处理的实体类型信息,使得下一层的学习任务变得明确。
之后将所述拼接表征输出进行特征融合,得到融合表征输出,是为了是特征之间进行融合学习,学习每个特征之间的影响。此时对拼接表征输出[[char-1,entity_type_vector],[char-2,entity_type_vector],[char-3,entity_type_vector],…,[char-n,entity_type_vector]]完成特征融合后,得到的融合表征输出中每个字记为t-vector-i。具体实施时,是调用预先训练的Transformer网络,将所述拼接表征输出输入至所述Transformer网络进行特征融合,得到融合表征输出。
最后将融合表征输出[t-vector-1,t-vector-2,…,t-vector-n]输入至动态图卷积神经网络(也即DGCNN模型)抽取必要特征信息及必要属性的起始终止位置,其中必要属性的起始终止位置包括必要属性起始位置数组和必要属性终止位置数组。
例如,输入[t-vector-1,t-vector-2,…,t-vector-n]至动态图卷积神经网络及与其连接的Dense层(可以理解为全连接层,Dense层利用sigmod函数对每个位置进行判定,用于定位实体的位置,并且可以确定实体的必要属性);
输出如下:
必要属性起始位置数组:[0,1,0,1,0,0,…,1];
必要属性终止位置数组:[0,1,0,1,0,0,…,1];
其中必要属性起始位置数组或必要属性起始位置数组的数组长度等于待处理文本的长度,必要属性起始位置数组中值为1的位置是必要属性的起始位置(在整个数组中的位置),获知必要属性位置之后从文本中定位就可以抽出必要属性了。
由于在获取实体中的必要属性和必要属性的起始终止位置时,使用了Transformer结构和CNN结构(具体是DGCNN模型),只是在计算实体信息表征时使用了RNN结构,但是相比于文本的长度实体的必要信息还是非常短的,所以在模型的训练和预测效率上高于常用的信息抽取模型。
而且Transformer结构和CNN结构可以在GPU上并行训练,而不像RNN的串行机制,所以速度上会很快,RNN的速度取决于文本的长度,由于实体信息很短,所以模型中采用的RNN会很高效。
S150、将所述实体中的必要属性和必要属性的起始终止位置通过调用预先训练的Bi-LSTM模型,依次进行实体表征向量提取、向量拼接特征融合和非必要属性抽取,得到实体中的非必要属性和非必要属性的起始终止位置。
在本实施例中,通过之前的步骤识别出实体中的必要属性和必要属性的起始终止位置后,相当于识别出了若干个具体实体,此时需要学习若干个具体实体的表征信息。Bi-LSTM模型(Bi-LSTM是Bi-directional Long Short-Term Memory的缩写,是由前向LSTM与后向LSTM组合而成)用于学习实体的表征信息。由于之前已定位了实体的具体位置并已知了具体实体,此时可将所述实体中的必要属性和必要属性的起始终止位置输入至Bi-LSTM模型进行运算,在进行向量拼接特征融合和非必要属性抽取,得到实体中的非必要属性和非必要属性的起始终止位置。通过Bi-LSTM模型能准确识别实体的表征信息,从而筛选出非必要属性。
在一实施例中,步骤S150包括:
调用预先训练的Bi-LSTM模型,将所述实体中的必要属性和必要属性的起始终止位置输入至所述Bi-LSTM模型进行运算,以得到实体表征向量;
将实体表征向量拼接至所述文本表征输出中每个字对应的向量表征,得到实体拼接表征输出;
将所述实体拼接表征输出进行特征融合,得到实体融合表征输出;
调用所述动态图卷积神经网络,将所述实体融合表征输出输入至所述动态图卷积神经网络进行运算,以得到实体中的非必要属性和非必要属性的起始终止位置。
在本实施例中,与步骤S140从文本表征输出提取必要属性的不同之处在于,步骤S150中是以所述实体中的必要属性和必要属性的起始终止位置作为Bi-LSTM模型得到实体拼接表征输出,之后参考与步骤S140具体实施例中进行特征融合和输入至所述动态图卷积神经网 络进行运算是完全相同,只是第一步的初始处理方式不同。
在抽取实体的必要属性,所采用的实体定位模型将实体类型信息编码到输入信息中,也就是将识别出的属性绑定到了实体类型中;同样,在采用属性抽取模型抽取非必要属性时,将实体信息编码到输入信息中,将实体与实体类型绑定到属性的抽取中。
可见,本申请中采用概率图的思想,模型的设计采用神经网络概率图的设计。必要属性定位和非必要属性抽取都是对整个原始文本表征进行双指针训练表征,同时训练阶段是随机选择一个实体进行抽取,预测阶段是遍历整个实体进行抽取,因此解决了属性共享问题。
该方法实现了对数据的属性抽取准确率的提升,而且对待处理文本的数据格式无限制,可以输入任何结构化的数据或者非结构化的数据。
本申请实施例还提供一种基于概率图模型的文本属性抽取装置,该基于概率图模型的文本属性抽取装置用于执行前述基于概率图模型的文本属性抽取方法的任一实施例。具体地,请参阅图3,图3是本申请实施例提供的基于概率图模型的文本属性抽取装置的示意性框图。该基于概率图模型的文本属性抽取装置100可以配置于服务器中。
如图3所示,基于概率图模型的文本属性抽取装置100包括:文本接收单元110、文本表征输出获取单元120、实体类型识别单元130、必要属性抽取单元140、非必要属性抽取单元150。
文本接收单元110,接收用户端上传的待处理文本。
在本实施例中,当用户端中有待处理文本需进行文本属性抽取时,可以由用户操作用户端(用户端为用户所使用的智能手机、平板电脑等智能终端)将待处理文本上传至服务器,其中对待处理文本的数据格式无限制,可以输入任何结构化的数据或者非结构化的数据。通过服务器对待处理文本进行属性抽取。例如,所述待处理文本为"双乳腺体略增厚,光点略密,腺体回声分布不均匀,结构略紊乱,右乳见数枚低回声结节,较大的大小约19mm14mm30mm(内上)、20mm9mm(外下),边界欠清,形态欠规则;左乳见数枚低回声结节,较大的大小约8mm4mm(外上),边界清。CDFI:未见明显异常血流信号。"
文本表征输出获取单元120,用于调用预先训练的BERT神经网络模型,将所述待处理文本输入至所述BERT神经网络模型进行运算,得到与所述待处理文本对应的文本表征输出;其中,所述文本表征输出中包括多个字分别对应的向量表征。
在本实施例中,BERT神经网络模型中BERT的全称是Bidirectional Encoder Representations from Transformers,是一种基于Transformer的双向语言模型(Transformer模型即翻译模型)。相较于Word2Vec模型,BERT神经网络模型能更准确的提取文本的字向量表征。
在一实施例中,文本表征输出获取单元120包括:
文本拆分单元,用于将所述待处理文本按字拆分得到分字集合;
字向量表征获取单元,用于将所述分字集合中每一字输入至所述BERT神经网络模型进行运算,得到与所述分字集合中每一字对应的向量表征,由每一字的向量表征组合得到与所述待处理文本对应的文本表征输出。
在本实施例中,由于BERT神经网络模型输出的文本表征输出,本质上是文本中每个字的向量表征的组合,此时可以将待处理文本按字拆分得到了由多个字组成的分字集合后,将分字集合中每一字输入至所述BERT神经网络模型进行运算,得到与所述分字集合中每一字对应的向量表征,例如char-i表示第i个字的向量表征,则文本表征输出为一个二维矩阵[char-1,char-2,char-3,…,char-n]。
实体类型识别单元130,用于调用预先训练的多任务学习分类模型,将所述文本表征输出输入至所述多任务学习分类模型进行识别,得到与所述文本表征输出对应的实体类型。
在本实施例中,多任务学习分类模型即Multi-Classification模型,其用于多任务学习以判断待处理文本中包含哪几种实体类型。例如,当将文本表征输出对应的二维矩阵[char-1,char-2,char-3,…,char-n]作为多任务学习分类模型的输入进行运算时,得到了一个输出向量[1 1]; 根据输出向量[1 1]即可统计获取所述文本表征输出对应的实体类型。
在一实施例中,实体类型识别单元130还用于:
所述文本表征输出输入至所述多任务学习分类模型进行识别,得到与所述文本表征输出对应的实体识别输出向量,根据所述实体识别输出向量中取值为1的向量值统计获取实体个数,以根据实体个数对应得到包括的实体类型。
在本实施例中,例如上述当将文本表征输出对应的二维矩阵[char-1,char-2,char-3,…,char-n]作为多任务学习分类模型的输入进行运算时,得到了一个输出向量[11],该输出向量中有2个取值为1的向量值,此时可以判定所述文本表征输出对应的是2个实体类型。例如“双乳腺体”和“双乳结节”即是与所述文本表征输出对应的2个实体类型,其中输出向量[1 1]中第一个“1”表示含有双乳腺体,第二个“1”表示含有双乳结节。通过多任务学习分类模型,能更准确的对实体类型进行识别。
必要属性抽取单元140,用于将所述文本表征输出对应的实体类型通过所调用预先存储的实体嵌入矩阵和预先训练的动态图卷积神经网络,依次进行递归、向量拼接、特征融合和必要属性抽取,得到实体中的必要属性和必要属性的起始终止位置。
在本实施例中,从实体类型识别出的实体中选择一个实体类型,经过Entity Embedding矩阵(Entity Embedding矩阵即实体嵌入矩阵)获取该实体的嵌入表征,将实体的嵌入表征拼接到BERT神经网络模型输出的文本表征输出中经过一次Transformer后作为动态图卷积神经网络的输入。之后动态图卷积神经网络将Transformer的输入经过4层膨胀卷积模型层对应得到的实体信息之后接上一个双指针序列标注模型学的实体信息。可见动态图卷积神经网络主要是通过输入的实体类型信息,学习到实体的标记属性来确定唯一实体。
在一实施例中,必要属性抽取单元140包括:
递归处理单元,用途公益将所述文本表征输出对应的实体类型通过所调用预先存储的实体嵌入矩阵进行递归处理,得到实体类型表征输出;
第一拼接单元,用于将所述实体类型表征输出拼接至所述文本表征输出中每个字对应的向量表征,得到拼接表征输出;
第一融合单元,用于将所述拼接表征输出进行特征融合,得到融合表征输出;
第一运算单元,用于调用预先训练的动态图卷积神经网络,将所述融合表征输出输入至所述动态图卷积神经网络进行运算,以得到实体中的必要属性和必要属性的起始终止位置。
在本实施例中,例如当选择“双乳腺体”或“双乳结节”中任意一个实体类型进行处理时,先获取该实体类型对应的实体类型表征输出,即获取图中的实体嵌入矩阵一行的值()因为所要处理的实体类型只有2种,所以矩阵是2行的,第一行表示双乳腺体的表征输出,第二行表示双乳结节的表征输;假如选择双乳结节即得到矩阵的第二行,记为entity_type_vector。
此时将entity_type_vector拼接到文本表征输出对应的二维矩阵[char-1,char-2,char-3,…,char-n]中的每一个字上,第i个字的向量表征从char-1变成了[char-i,entity_type_vector]。此处进行entity_type_vector与[char-1,char-2,char-3,…,char-n]的拼接并不是累加,其目的是为了让字的表征信息融合了所要处理的实体类型信息,使得下一层的学习任务变得明确。
之后将所述拼接表征输出进行特征融合,得到融合表征输出,是为了是特征之间进行融合学习,学习每个特征之间的影响。此时对拼接表征输出[[char-1,entity_type_vector],[char-2,entity_type_vector],[char-3,entity_type_vector],…,[char-n,entity_type_vector]]完成特征融合后,得到的融合表征输出中每个字记为t-vector-i。具体实施时,是调用预先训练的Transformer网络,将所述拼接表征输出输入至所述Transformer网络进行特征融合,得到融合表征输出。
最后将融合表征输出[t-vector-1,t-vector-2,…,t-vector-n]输入至动态图卷积神经网络(也即DGCNN模型)抽取必要特征信息及必要属性的起始终止位置,其中必要属性的起始终止位置包括必要属性起始位置数组和必要属性终止位置数组。
例如,输入[t-vector-1,t-vector-2,…,t-vector-n]至动态图卷积神经网络及与其连接的 Dense层(可以理解为全连接层,Dense层利用sigmod函数对每个位置进行判定,用于定位实体的位置,并且可以确定实体的必要属性);
输出如下:
必要属性起始位置数组:[0,1,0,1,0,0,…,1];
必要属性终止位置数组:[0,1,0,1,0,0,…,1];
其中必要属性起始位置数组或必要属性起始位置数组的数组长度等于待处理文本的长度,必要属性起始位置数组中值为1的位置是必要属性的起始位置(在整个数组中的位置),获知必要属性位置之后从文本中定位就可以抽出必要属性了。
由于在获取实体中的必要属性和必要属性的起始终止位置时,使用了Transformer结构和CNN结构(具体是DGCNN模型),只是在计算实体信息表征时使用了RNN结构,但是相比于文本的长度实体的必要信息还是非常短的,所以在模型的训练和预测效率上高于常用的信息抽取模型。
而且Transformer结构和CNN结构可以在GPU上并行训练,而不像RNN的串行机制,所以速度上会很快,RNN的速度取决于文本的长度,由于实体信息很短,所以模型中采用的RNN会很高效。
非必要属性抽取单元150,用于将所述实体中的必要属性和必要属性的起始终止位置通过调用预先训练的Bi-LSTM模型,依次进行实体表征向量提取、向量拼接特征融合和非必要属性抽取,得到实体中的非必要属性和非必要属性的起始终止位置。
在本实施例中,通过之前的步骤识别出实体中的必要属性和必要属性的起始终止位置后,相当于识别出了若干个具体实体,此时需要学习若干个具体实体的表征信息。Bi-LSTM模型(Bi-LSTM是Bi-directional Long Short-Term Memory的缩写,是由前向LSTM与后向LSTM组合而成)用于学习实体的表征信息。由于之前已定位了实体的具体位置并已知了具体实体,此时可将所述实体中的必要属性和必要属性的起始终止位置输入至Bi-LSTM模型进行运算,在进行向量拼接特征融合和非必要属性抽取,得到实体中的非必要属性和非必要属性的起始终止位置。通过Bi-LSTM模型能准确识别实体的表征信息,从而筛选出非必要属性。
在一实施例中,非必要属性抽取单元150包括:
实体表征向量获取单元,用于调用预先训练的Bi-LSTM模型,将所述实体中的必要属性和必要属性的起始终止位置输入至所述Bi-LSTM模型进行运算,以得到实体表征向量;
第二拼接单元,用于将实体表征向量拼接至所述文本表征输出中每个字对应的向量表征,得到实体拼接表征输出;
第二融合单元,用于将所述实体拼接表征输出进行特征融合,得到实体融合表征输出;
第二运算单元,用于调用所述动态图卷积神经网络,将所述实体融合表征输出输入至所述动态图卷积神经网络进行运算,以得到实体中的非必要属性和非必要属性的起始终止位置。
在本实施例中,与从文本表征输出提取必要属性的不同之处在于,非必要属性抽取单元150中是以所述实体中的必要属性和必要属性的起始终止位置作为Bi-LSTM模型得到实体拼接表征输出,之后参考与必要属性抽取单元140的具体实施例中进行特征融合和输入至所述动态图卷积神经网络进行运算是完全相同,只是第一步的初始处理方式不同。
在抽取实体的必要属性,所采用的实体定位模型将实体类型信息编码到输入信息中,也就是将识别出的属性绑定到了实体类型中;同样,在采用属性抽取模型抽取非必要属性时,将实体信息编码到输入信息中,将实体与实体类型绑定到属性的抽取中。
可见,本申请中采用概率图的思想,模型的设计采用神经网络概率图的设计。必要属性定位和非必要属性抽取都是对整个原始文本表征进行双指针训练表征,同时训练阶段是随机选择一个实体进行抽取,预测阶段是遍历整个实体进行抽取,因此解决了属性共享问题。
该装置实现了对数据的属性抽取准确率的提升,而且对待处理文本的数据格式无限制,可以输入任何结构化的数据或者非结构化的数据。
上述基于概率图模型的文本属性抽取装置可以实现为计算机程序的形式,该计算机程序 可以在如图4所示的计算机设备上运行。
请参阅图4,图4是本申请实施例提供的计算机设备的示意性框图。该计算机设备500是服务器,服务器可以是独立的服务器,也可以是多个服务器组成的服务器集群。
参阅图4,该计算机设备500包括通过系统总线501连接的处理器502、存储器和网络接口505,其中,存储器可以包括非易失性存储介质503和内存储器504。
该非易失性存储介质503可存储操作系统5031和计算机程序5032。该计算机程序5032被执行时,可使得处理器502执行基于概率图模型的文本属性抽取方法。
该处理器502用于提供计算和控制能力,支撑整个计算机设备500的运行。
该内存储器504为非易失性存储介质503中的计算机程序5032的运行提供环境,该计算机程序5032被处理器502执行时,可使得处理器502执行基于概率图模型的文本属性抽取方法。
该网络接口505用于进行网络通信,如提供数据信息的传输等。本领域技术人员可以理解,图4中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备500的限定,具体的计算机设备500可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
其中,所述处理器502用于运行存储在存储器中的计算机程序5032,以实现本申请实施例公开的基于概率图模型的文本属性抽取方法。
本领域技术人员可以理解,图4中示出的计算机设备的实施例并不构成对计算机设备具体构成的限定,在其他实施例中,计算机设备可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。例如,在一些实施例中,计算机设备可以仅包括存储器及处理器,在这样的实施例中,存储器及处理器的结构及功能与图4所示实施例一致,在此不再赘述。
应当理解,在本申请实施例中,处理器502可以是中央处理单元(Central Processing Unit,CPU),该处理器502还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。其中,通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
在本申请的另一实施例中提供计算机可读存储介质。该计算机可读存储介质可以是非易失性,也可以是易失性。该计算机可读存储介质存储有计算机程序,其中计算机程序被处理器执行时实现本申请实施例公开的基于概率图模型的文本属性抽取方法。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的设备、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
在本申请所提供的几个实施例中,应该理解到,所揭露的设备、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为逻辑功能划分,实际实现时可以有另外的划分方式,也可以将具有相同功能的单元集合成一个单元,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口、装置或单元的间接耦合或通信连接,也可以是电的,机械的或其它的形式连接。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元 上。可以根据实际的需要选择其中的部分或者全部单元来实现本申请实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以是两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分,或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。

Claims (20)

  1. 一种基于概率图模型的文本属性抽取方法,其中,包括:
    接收用户端上传的待处理文本;
    调用预先训练的BERT神经网络模型,将所述待处理文本输入至所述BERT神经网络模型进行运算,得到与所述待处理文本对应的文本表征输出;其中,所述文本表征输出中包括多个字分别对应的向量表征;
    调用预先训练的多任务学习分类模型,将所述文本表征输出输入至所述多任务学习分类模型进行识别,得到与所述文本表征输出对应的实体类型;
    将所述文本表征输出对应的实体类型通过所调用预先存储的实体嵌入矩阵和预先训练的动态图卷积神经网络,依次进行递归、向量拼接、特征融合和必要属性抽取,得到实体中的必要属性和必要属性的起始终止位置;以及
    将所述实体中的必要属性和必要属性的起始终止位置通过调用预先训练的Bi-LSTM模型,依次进行实体表征向量提取、向量拼接特征融合和非必要属性抽取,得到实体中的非必要属性和非必要属性的起始终止位置。
  2. 根据权利要求1所述的基于概率图模型的文本属性抽取方法,其中,所述将所述待处理文本输入至所述BERT神经网络模型进行运算,得到与所述待处理文本对应的文本表征输出,包括:
    将所述待处理文本按字拆分得到分字集合;
    将所述分字集合中每一字输入至所述BERT神经网络模型进行运算,得到与所述分字集合中每一字对应的向量表征,由每一字的向量表征组合得到与所述待处理文本对应的文本表征输出。
  3. 根据权利要求1所述的基于概率图模型的文本属性抽取方法,其中,所述将所述文本表征输出输入至所述多任务学习分类模型进行识别,得到与所述文本表征输出对应的实体类型,包括:
    所述文本表征输出输入至所述多任务学习分类模型进行识别,得到与所述文本表征输出对应的实体识别输出向量,根据所述实体识别输出向量中取值为1的向量值统计获取实体个数,以根据实体个数对应得到包括的实体类型。
  4. 根据权利要求1所述的基于概率图模型的文本属性抽取方法,其中,所述将所述文本表征输出对应的实体类型通过所调用预先存储的实体嵌入矩阵和预先训练的动态图卷积神经网络,依次进行递归、向量拼接、特征融合和必要属性抽取,得到实体中的必要属性和必要属性的起始终止位置,包括;
    将所述文本表征输出对应的实体类型通过所调用预先存储的实体嵌入矩阵进行递归处理,得到实体类型表征输出;
    将所述实体类型表征输出拼接至所述文本表征输出中每个字对应的向量表征,得到拼接表征输出;
    将所述拼接表征输出进行特征融合,得到融合表征输出;
    调用预先训练的动态图卷积神经网络,将所述融合表征输出输入至所述动态图卷积神经网络进行运算,以得到实体中的必要属性和必要属性的起始终止位置。
  5. 根据权利要求1所述的基于概率图模型的文本属性抽取方法,其中,所述将所述拼接表征输出进行特征融合,得到融合表征输出,包括:
    调用预先训练的Transformer网络,将所述拼接表征输出输入至所述Transformer网络进行特征融合,得到融合表征输出。
  6. 根据权利要求1所述的基于概率图模型的文本属性抽取方法,其中,所述将所述实体中的必要属性和必要属性的起始终止位置通过调用预先训练的Bi-LSTM模型,依次进行实体表征向量提取、向量拼接特征融合和非必要属性抽取,得到实体中的非必要属性和非必要属性的起始终止位置,包括:
    调用预先训练的Bi-LSTM模型,将所述实体中的必要属性和必要属性的起始终止位置输入至所述Bi-LSTM模型进行运算,以得到实体表征向量;
    将实体表征向量拼接至所述文本表征输出中每个字对应的向量表征,得到实体拼接表征输出;
    将所述实体拼接表征输出进行特征融合,得到实体融合表征输出;
    调用所述动态图卷积神经网络,将所述实体融合表征输出输入至所述动态图卷积神经网络进行运算,以得到实体中的非必要属性和非必要属性的起始终止位置。
  7. 根据权利要求1所述的基于概率图模型的文本属性抽取方法,其中,所述待处理文本为结构化数据文本或非结构化数据文本。
  8. 根据权利要求1所述的基于概率图模型的文本属性抽取方法,其中,所述调用预先训练的动态图卷积神经网络,将所述融合表征输出输入至所述动态图卷积神经网络进行运算,以得到实体中的必要属性和必要属性的起始终止位置,包括:
    将融合表征输出输入至动态图卷积神经网络抽取必要特征信息及必要属性的起始终止位置,其中必要属性的起始终止位置包括必要属性起始位置数组和必要属性终止位置数组。
  9. 一种基于概率图模型的文本属性抽取装置,其中,包括:
    文本接收单元,用于接收用户端上传的待处理文本;
    文本表征输出获取单元,用于调用预先训练的BERT神经网络模型,将所述待处理文本输入至所述BERT神经网络模型进行运算,得到与所述待处理文本对应的文本表征输出;其中,所述文本表征输出中包括多个字分别对应的向量表征;
    实体类型识别单元,用于调用预先训练的多任务学习分类模型,将所述文本表征输出输入至所述多任务学习分类模型进行识别,得到与所述文本表征输出对应的实体类型;
    必要属性抽取单元,用于将所述文本表征输出对应的实体类型通过所调用预先存储的实体嵌入矩阵和预先训练的动态图卷积神经网络,依次进行递归、向量拼接、特征融合和必要属性抽取,得到实体中的必要属性和必要属性的起始终止位置;以及
    非必要属性抽取单元,用于将所述实体中的必要属性和必要属性的起始终止位置通过调用预先训练的Bi-LSTM模型,依次进行实体表征向量提取、向量拼接特征融合和非必要属性抽取,得到实体中的非必要属性和非必要属性的起始终止位置。
  10. 一种计算机设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现以下步骤:
    接收用户端上传的待处理文本;
    调用预先训练的BERT神经网络模型,将所述待处理文本输入至所述BERT神经网络模型进行运算,得到与所述待处理文本对应的文本表征输出;其中,所述文本表征输出中包括多个字分别对应的向量表征;
    调用预先训练的多任务学习分类模型,将所述文本表征输出输入至所述多任务学习分类模型进行识别,得到与所述文本表征输出对应的实体类型;
    将所述文本表征输出对应的实体类型通过所调用预先存储的实体嵌入矩阵和预先训练的动态图卷积神经网络,依次进行递归、向量拼接、特征融合和必要属性抽取,得到实体中的必要属性和必要属性的起始终止位置;以及
    将所述实体中的必要属性和必要属性的起始终止位置通过调用预先训练的Bi-LSTM模型,依次进行实体表征向量提取、向量拼接特征融合和非必要属性抽取,得到实体中的非必要属性和非必要属性的起始终止位置。
  11. 根据权利要求10所述的计算机设备,其中,所述将所述待处理文本输入至所述BERT神经网络模型进行运算,得到与所述待处理文本对应的文本表征输出,包括:
    将所述待处理文本按字拆分得到分字集合;
    将所述分字集合中每一字输入至所述BERT神经网络模型进行运算,得到与所述分字集合中每一字对应的向量表征,由每一字的向量表征组合得到与所述待处理文本对应的文本表 征输出。
  12. 根据权利要求10所述的计算机设备,其中,所述将所述文本表征输出输入至所述多任务学习分类模型进行识别,得到与所述文本表征输出对应的实体类型,包括:
    所述文本表征输出输入至所述多任务学习分类模型进行识别,得到与所述文本表征输出对应的实体识别输出向量,根据所述实体识别输出向量中取值为1的向量值统计获取实体个数,以根据实体个数对应得到包括的实体类型。
  13. 根据权利要求10所述的计算机设备,其中,所述将所述文本表征输出对应的实体类型通过所调用预先存储的实体嵌入矩阵和预先训练的动态图卷积神经网络,依次进行递归、向量拼接、特征融合和必要属性抽取,得到实体中的必要属性和必要属性的起始终止位置,包括;
    将所述文本表征输出对应的实体类型通过所调用预先存储的实体嵌入矩阵进行递归处理,得到实体类型表征输出;
    将所述实体类型表征输出拼接至所述文本表征输出中每个字对应的向量表征,得到拼接表征输出;
    将所述拼接表征输出进行特征融合,得到融合表征输出;
    调用预先训练的动态图卷积神经网络,将所述融合表征输出输入至所述动态图卷积神经网络进行运算,以得到实体中的必要属性和必要属性的起始终止位置。
  14. 根据权利要求10所述的计算机设备,其中,所述将所述拼接表征输出进行特征融合,得到融合表征输出,包括:
    调用预先训练的Transformer网络,将所述拼接表征输出输入至所述Transformer网络进行特征融合,得到融合表征输出。
  15. 根据权利要求10所述的计算机设备,其中,所述将所述实体中的必要属性和必要属性的起始终止位置通过调用预先训练的Bi-LSTM模型,依次进行实体表征向量提取、向量拼接特征融合和非必要属性抽取,得到实体中的非必要属性和非必要属性的起始终止位置,包括:
    调用预先训练的Bi-LSTM模型,将所述实体中的必要属性和必要属性的起始终止位置输入至所述Bi-LSTM模型进行运算,以得到实体表征向量;
    将实体表征向量拼接至所述文本表征输出中每个字对应的向量表征,得到实体拼接表征输出;
    将所述实体拼接表征输出进行特征融合,得到实体融合表征输出;
    调用所述动态图卷积神经网络,将所述实体融合表征输出输入至所述动态图卷积神经网络进行运算,以得到实体中的非必要属性和非必要属性的起始终止位置。
  16. 根据权利要求10所述的计算机设备,其中,所述待处理文本为结构化数据文本或非结构化数据文本。
  17. 根据权利要求10所述的计算机设备,其中,所述调用预先训练的动态图卷积神经网络,将所述融合表征输出输入至所述动态图卷积神经网络进行运算,以得到实体中的必要属性和必要属性的起始终止位置,包括:
    将融合表征输出输入至动态图卷积神经网络抽取必要特征信息及必要属性的起始终止位置,其中必要属性的起始终止位置包括必要属性起始位置数组和必要属性终止位置数组。
  18. 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有计算机程序,所述计算机程序当被处理器执行时使所述处理器执行以下操作:
    接收用户端上传的待处理文本;
    调用预先训练的BERT神经网络模型,将所述待处理文本输入至所述BERT神经网络模型进行运算,得到与所述待处理文本对应的文本表征输出;其中,所述文本表征输出中包括多个字分别对应的向量表征;
    调用预先训练的多任务学习分类模型,将所述文本表征输出输入至所述多任务学习分类 模型进行识别,得到与所述文本表征输出对应的实体类型;
    将所述文本表征输出对应的实体类型通过所调用预先存储的实体嵌入矩阵和预先训练的动态图卷积神经网络,依次进行递归、向量拼接、特征融合和必要属性抽取,得到实体中的必要属性和必要属性的起始终止位置;以及
    将所述实体中的必要属性和必要属性的起始终止位置通过调用预先训练的Bi-LSTM模型,依次进行实体表征向量提取、向量拼接特征融合和非必要属性抽取,得到实体中的非必要属性和非必要属性的起始终止位置。
  19. 根据权利要求18所述的计算机可读存储介质,其中,所述将所述待处理文本输入至所述BERT神经网络模型进行运算,得到与所述待处理文本对应的文本表征输出,包括:
    将所述待处理文本按字拆分得到分字集合;
    将所述分字集合中每一字输入至所述BERT神经网络模型进行运算,得到与所述分字集合中每一字对应的向量表征,由每一字的向量表征组合得到与所述待处理文本对应的文本表征输出。
  20. 根据权利要求18所述的计算机可读存储介质,其中,所述将所述文本表征输出输入至所述多任务学习分类模型进行识别,得到与所述文本表征输出对应的实体类型,包括:
    所述文本表征输出输入至所述多任务学习分类模型进行识别,得到与所述文本表征输出对应的实体识别输出向量,根据所述实体识别输出向量中取值为1的向量值统计获取实体个数,以根据实体个数对应得到包括的实体类型。
PCT/CN2020/119137 2020-07-31 2020-09-30 基于概率图模型的文本属性抽取方法、装置、计算机设备及存储介质 WO2021135477A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010761083.1 2020-07-31
CN202010761083.1A CN111914559B (zh) 2020-07-31 2020-07-31 基于概率图模型的文本属性抽取方法、装置及计算机设备

Publications (1)

Publication Number Publication Date
WO2021135477A1 true WO2021135477A1 (zh) 2021-07-08

Family

ID=73288031

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/119137 WO2021135477A1 (zh) 2020-07-31 2020-09-30 基于概率图模型的文本属性抽取方法、装置、计算机设备及存储介质

Country Status (2)

Country Link
CN (1) CN111914559B (zh)
WO (1) WO2021135477A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113468288A (zh) * 2021-07-23 2021-10-01 平安国际智慧城市科技股份有限公司 基于人工智能的文本课件的内容抽取方法及相关设备
CN114020910A (zh) * 2021-11-03 2022-02-08 北京中科凡语科技有限公司 基于TextCNN的医疗文本特征提取方法及装置
CN114298052A (zh) * 2022-01-04 2022-04-08 中国人民解放军国防科技大学 一种基于概率图的实体联合标注关系抽取方法和系统
CN114548099A (zh) * 2022-02-25 2022-05-27 桂林电子科技大学 基于多任务框架的方面词和方面类别联合抽取和检测方法
CN114898155A (zh) * 2022-05-18 2022-08-12 平安科技(深圳)有限公司 车辆定损方法、装置、设备及存储介质
CN116485729A (zh) * 2023-04-03 2023-07-25 兰州大学 基于transformer的多级桥梁缺陷检测方法

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112434510B (zh) * 2020-11-24 2024-03-29 北京字节跳动网络技术有限公司 一种信息处理方法、装置、电子设备和存储介质
CN112613316B (zh) * 2020-12-31 2023-06-20 北京师范大学 一种生成古汉语标注模型的方法和系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180121618A1 (en) * 2016-11-02 2018-05-03 Cota Inc. System and method for extracting oncological information of prognostic significance from natural language
CN110728153A (zh) * 2019-10-15 2020-01-24 天津理工大学 基于模型融合的多类别情感分类方法
CN111078886A (zh) * 2019-12-18 2020-04-28 成都迪普曼林信息技术有限公司 基于dmcnn的特殊事件提取系统
CN111401061A (zh) * 2020-03-19 2020-07-10 昆明理工大学 基于BERT及BiLSTM-Attention的涉案新闻观点句识别方法

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108038183B (zh) * 2017-12-08 2020-11-24 北京百度网讯科技有限公司 结构化实体收录方法、装置、服务器和存储介质
CN110795543B (zh) * 2019-09-03 2023-09-22 腾讯科技(深圳)有限公司 基于深度学习的非结构化数据抽取方法、装置及存储介质
CN111046186A (zh) * 2019-10-30 2020-04-21 平安科技(深圳)有限公司 知识图谱的实体对齐方法、装置、设备及存储介质
CN111160008B (zh) * 2019-12-18 2022-03-25 华南理工大学 一种实体关系联合抽取方法及系统
CN111401058B (zh) * 2020-03-12 2023-05-02 广州大学 一种基于命名实体识别工具的属性值抽取方法及装置
CN111460149B (zh) * 2020-03-27 2023-07-25 科大讯飞股份有限公司 文本分类方法、相关设备及可读存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180121618A1 (en) * 2016-11-02 2018-05-03 Cota Inc. System and method for extracting oncological information of prognostic significance from natural language
CN110728153A (zh) * 2019-10-15 2020-01-24 天津理工大学 基于模型融合的多类别情感分类方法
CN111078886A (zh) * 2019-12-18 2020-04-28 成都迪普曼林信息技术有限公司 基于dmcnn的特殊事件提取系统
CN111401061A (zh) * 2020-03-19 2020-07-10 昆明理工大学 基于BERT及BiLSTM-Attention的涉案新闻观点句识别方法

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113468288A (zh) * 2021-07-23 2021-10-01 平安国际智慧城市科技股份有限公司 基于人工智能的文本课件的内容抽取方法及相关设备
CN113468288B (zh) * 2021-07-23 2024-04-16 平安国际智慧城市科技股份有限公司 基于人工智能的文本课件的内容抽取方法及相关设备
CN114020910A (zh) * 2021-11-03 2022-02-08 北京中科凡语科技有限公司 基于TextCNN的医疗文本特征提取方法及装置
CN114298052A (zh) * 2022-01-04 2022-04-08 中国人民解放军国防科技大学 一种基于概率图的实体联合标注关系抽取方法和系统
CN114548099A (zh) * 2022-02-25 2022-05-27 桂林电子科技大学 基于多任务框架的方面词和方面类别联合抽取和检测方法
CN114548099B (zh) * 2022-02-25 2024-03-26 桂林电子科技大学 基于多任务框架的方面词和方面类别联合抽取和检测方法
CN114898155A (zh) * 2022-05-18 2022-08-12 平安科技(深圳)有限公司 车辆定损方法、装置、设备及存储介质
CN114898155B (zh) * 2022-05-18 2024-05-28 平安科技(深圳)有限公司 车辆定损方法、装置、设备及存储介质
CN116485729A (zh) * 2023-04-03 2023-07-25 兰州大学 基于transformer的多级桥梁缺陷检测方法
CN116485729B (zh) * 2023-04-03 2024-01-12 兰州大学 基于transformer的多级桥梁缺陷检测方法

Also Published As

Publication number Publication date
CN111914559B (zh) 2023-04-07
CN111914559A (zh) 2020-11-10

Similar Documents

Publication Publication Date Title
WO2021135477A1 (zh) 基于概率图模型的文本属性抽取方法、装置、计算机设备及存储介质
US11651163B2 (en) Multi-turn dialogue response generation with persona modeling
WO2020019686A1 (zh) 一种会话交互方法及装置
US10510336B2 (en) Method, apparatus, and system for conflict detection and resolution for competing intent classifiers in modular conversation system
WO2021068352A1 (zh) Faq问答对自动构建方法、装置、计算机设备及存储介质
WO2020143320A1 (zh) 文本词向量获取方法、装置、计算机设备及存储介质
CN112084789B (zh) 文本处理方法、装置、设备及存储介质
TW202020691A (zh) 特徵詞的確定方法、裝置和伺服器
JP7457125B2 (ja) 翻訳方法、装置、電子機器及びコンピュータプログラム
CN112650854B (zh) 基于多知识图谱的智能答复方法、装置及计算机设备
CN112487168A (zh) 知识图谱的语义问答方法、装置、计算机设备及存储介质
WO2024098533A1 (zh) 图文双向搜索方法、装置、设备及非易失性可读存储介质
CN109460220A (zh) 报文预定义代码生成方法、装置、电子设备和存储介质
WO2021120779A1 (zh) 一种基于人机对话的用户画像构建方法、系统、终端及存储介质
WO2023045184A1 (zh) 一种文本类别识别方法、装置、计算机设备及介质
CN107977357A (zh) 基于用户反馈的纠错方法、装置及其设备
WO2019114618A1 (zh) 一种深度神经网络训练方法、装置及计算机设备
WO2022105121A1 (zh) 一种应用于bert模型的蒸馏方法、装置、设备及存储介质
CN114328980A (zh) 结合rpa及ai的知识图谱构建方法、装置、终端及存储介质
CN115410717A (zh) 模型训练方法、数据检索方法、影像数据检索方法和装置
JP7309811B2 (ja) データ注釈方法、装置、電子機器および記憶媒体
CN107832447A (zh) 用于移动终端的用户反馈纠错方法、装置及其设备
CN117371428A (zh) 基于大语言模型的文本处理方法与装置
CN116798053A (zh) 图标生成方法及装置
CN115186738A (zh) 模型训练方法、装置和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20910266

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20910266

Country of ref document: EP

Kind code of ref document: A1