CN115879468A

CN115879468A - Text element extraction method, device and equipment based on natural language understanding

Info

Publication number: CN115879468A
Application number: CN202211734143.6A
Authority: CN
Inventors: 陈佳颖
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-12-30
Filing date: 2022-12-30
Publication date: 2023-03-31
Anticipated expiration: 2042-12-30
Also published as: CN115879468B

Abstract

The utility model provides a text element extraction method based on natural language understanding, a neural network training method, a device and equipment, which relates to the field of artificial intelligence, in particular to natural language processing and deep learning technology, and can be applied to intelligent cities and intelligent government scenes. The text element extraction method comprises the following steps: determining a target hyponym in a target text; constructing a target input, the target input comprising at least target text; processing the target input by using a pre-training model to obtain an intermediate feature, wherein the intermediate feature represents semantic information of a target text and represents at least one of the semantic information of a target hyponym and the position of the target hyponym in the target text; and utilizing the hypernym determination sub-network to process the intermediate features so as to obtain the hypernym corresponding to the target hyponym.

Description

Text element extraction method, device and equipment based on natural language understanding

Technical Field

The present disclosure relates to the field of artificial intelligence, and in particular, to techniques for natural language processing and deep learning, which can be applied to smart cities and smart government scenes, and more particularly, to a method for extracting text elements based on natural language understanding, a method for training a neural network, a device for extracting text elements based on natural language understanding, a device for training a neural network, an electronic device, a computer-readable storage medium, and a computer program product.

Background

Artificial intelligence is the subject of research that makes computers simulate some human mental processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.), both at the hardware level and at the software level. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, a machine learning/deep learning technology, a big data processing technology, a knowledge map technology and the like.

When dealing with a large amount of text contents in the same field, the relevant personnel need to see the text contents throughout, and then manually screen out the required core concerned text sections, such as the current medical history, symptoms, prescriptions and the like in the case. However, even if these core elements are manually extracted, the description forms of the text contents of the elements such as "symptoms" are diversified, and it is difficult for the related person to find the association of "symptoms" in each case in thousands of cases, but if the diversified detailed descriptions of "symptoms" can be extended to broader subject words (i.e., hypernyms), the related person can easily screen or search for cases of the same "symptoms" based on the hypernyms.

The approaches described in this section are not necessarily approaches that have been previously conceived or pursued. Unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, unless otherwise indicated, the problems mentioned in this section should not be considered as having been acknowledged in any prior art.

Disclosure of Invention

The present disclosure provides a natural language understanding-based text element extraction method, a neural network training method, a natural language understanding-based text element extraction device, a neural network training device, an electronic device, a computer-readable storage medium, and a computer program product.

According to an aspect of the present disclosure, there is provided a text element extraction method based on natural language understanding, a neural network including a pre-training model for natural language processing and a hypernym determination sub-network. The method comprises the following steps: determining a target hyponym in a target text; constructing a target input, the target input comprising at least a target text; processing the target input by using a pre-training model to obtain an intermediate feature, wherein the intermediate feature represents semantic information of a target text and represents at least one of the semantic information of a target hyponym and the position of the target hyponym in the target text; and utilizing the hypernym determination sub-network to process the intermediate features so as to obtain the hypernym corresponding to the target hyponym.

According to an aspect of the present disclosure, there is provided a training method of a neural network including a pre-training model for natural language processing and a hypernym determination sub-network. The method comprises the following steps: acquiring a sample text, sample hyponyms in the sample text and real hypernyms of the sample hyponyms; constructing a sample input, the sample input comprising at least sample text; processing sample input by using a pre-training model to obtain sample intermediate features, wherein the sample intermediate features represent semantic information of a sample text and represent at least one of the semantic information of sample hyponyms and positions of the sample hyponyms in the sample text; utilizing the hypernym determination sub-network to process the intermediate characteristics of the sample so as to obtain a predicted hypernym corresponding to the hyponym of the sample; and adjusting parameters of the neural network based on the real hypernym and the predicted hypernym to obtain the trained neural network.

According to another aspect of the present disclosure, there is provided a text element extraction apparatus based on natural language understanding, a neural network including a pre-training model for natural language processing and an hypernym determination sub-network, the apparatus including: a determining unit configured to determine a target hyponym in the target text; a first construction unit configured to construct a target input, the target input including at least a target text; a first processing unit configured to process the target input by using a pre-training model to obtain an intermediate feature, wherein the intermediate feature represents semantic information of the target text and represents at least one of the semantic information of the target hyponym and a position of the target hyponym in the target text; and a second processing unit configured to process the intermediate features using the hypernym determination sub-network to obtain hypernyms corresponding to the target hyponyms.

According to another aspect of the present disclosure, there is provided a training apparatus of a neural network including a pre-training model for natural language processing and a hypernym determination subnetwork, the apparatus including: the acquisition unit is configured to acquire the sample text, the sample hyponyms in the sample text and the real hypernyms of the sample hyponyms; a second construction unit configured to construct a sample input, the sample input comprising at least sample text; a third processing unit configured to process the sample input by using the pre-training model to obtain a sample intermediate feature, wherein the sample intermediate feature represents semantic information of the sample text, and represents at least one of the semantic information of the sample hyponym and a position of the sample hyponym in the sample text; the fourth processing unit is configured to process the sample intermediate features by utilizing the hypernym determination sub-network to obtain predicted hypernyms corresponding to the sample hyponyms; and the parameter adjusting unit is configured to adjust parameters of the neural network based on the real hypernym and the predicted hypernym so as to obtain the trained neural network.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method described above.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the above method.

According to another aspect of the disclosure, a computer program product is provided, comprising a computer program, wherein the computer program realizes the above-mentioned method when executed by a processor.

According to one or more embodiments of the present disclosure, by constructing a model input of a context text including a hyponym, semantic information of the context text can be utilized to better judge which hypernym the hyponym belongs to, thereby improving accuracy of a result output by the model.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the embodiments and, together with the description, serve to explain the exemplary implementations of the embodiments. The illustrated embodiments are for purposes of illustration only and do not limit the scope of the claims. Throughout the drawings, identical reference numbers designate similar, but not necessarily identical, elements.

FIG. 1 illustrates a schematic diagram of an exemplary system in which various methods described herein may be implemented, according to an embodiment of the present disclosure;

FIG. 2 shows a flow diagram of a text element extraction method according to an example embodiment of the present disclosure;

FIG. 3 shows a schematic diagram of a neural network, according to an example embodiment of the present disclosure;

FIG. 4 shows a schematic diagram of a neural network, according to an example embodiment of the present disclosure;

FIG. 5 shows a flow chart of a method of training a neural network according to an exemplary embodiment of the present disclosure;

fig. 6 is a block diagram illustrating a structure of a text element extracting apparatus according to an exemplary embodiment of the present disclosure;

FIG. 7 shows a block diagram of a training apparatus for a neural network according to an exemplary embodiment of the present disclosure; and

FIG. 8 illustrates a block diagram of an exemplary electronic device that can be used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the present disclosure, unless otherwise specified, the use of the terms "first", "second", etc. to describe various elements is not intended to limit the positional relationship, the timing relationship, or the importance relationship of the elements, and such terms are used only to distinguish one element from another. In some examples, a first element and a second element may refer to the same instance of the element, and in some cases, based on the context, they may also refer to different instances.

The terminology used in the description of the various described examples in this disclosure is for the purpose of describing particular examples only and is not intended to be limiting. Unless the context clearly indicates otherwise, if the number of elements is not specifically limited, the elements may be one or more. Furthermore, the term "and/or" as used in this disclosure is intended to encompass any and all possible combinations of the listed items.

Hypernym determination methods in the related art can be classified into three types: pattern matching based methods, statistical value based methods, and neural network model based methods (e.g., a prompt based generative model). However, the first method cannot capture semantic information, and requires manual template customization, which is more limited; the second method also cannot capture semantic information and needs to manually set a threshold, but the threshold is difficult to set; the third method is difficult to specify a prompt and is difficult to determine a corresponding hypernym for a hyponym with a fuzzy meaning.

In order to solve the above problems, the present disclosure enables semantic information of a context text to be used to better determine to which hypernym the hyponym belongs by constructing a model input of the context text including the hyponym, thereby improving accuracy of a result output by the model.

Embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.

Fig. 1 illustrates a schematic diagram of an exemplary system 100 in which various methods and apparatus described herein may be implemented in accordance with embodiments of the present disclosure. Referring to fig. 1, the system 100 includes one or

more client devices

101, 102, 103, 104, 105, and 106, a server 120, and one or more communication networks 110 coupling the one or more client devices to the server 120.

Client devices

101, 102, 103, 104, 105, and 106 may be configured to execute one or more applications.

In embodiments of the present disclosure, the server 120 may run one or more services or software applications that enable the method of text element extraction to be performed.

In some embodiments, the server 120 may also provide other services or software applications that may include non-virtual environments and virtual environments. In certain embodiments, these services may be provided as web-based services or cloud services, such as provided to users of

client devices

101, 102, 103, 104, 105, and/or 106 under a software as a service (SaaS) network.

In the configuration shown in fig. 1, server 120 may include one or more components that implement the functions performed by server 120. These components may include software components, hardware components, or a combination thereof, which may be executed by one or more processors. A user operating a

client device

101, 102, 103, 104, 105, and/or 106 may, in turn, utilize one or more client applications to interact with the server 120 to take advantage of the services provided by these components. It should be understood that a variety of different system configurations are possible, which may differ from system 100. Accordingly, fig. 1 is one example of a system for implementing the various methods described herein and is not intended to be limiting.

A user may use

client devices

101, 102, 103, 104, 105, and/or 106 for human-computer interaction. The client device may provide an interface that enables a user of the client device to interact with the client device. The client device may also output information to the user via the interface. Although fig. 1 depicts only six client devices, those skilled in the art will appreciate that any number of client devices may be supported by the present disclosure.

Client devices

101, 102, 103, 104, 105, and/or 106 may include various types of computer devices, such as portable handheld devices, general purpose computers (such as personal computers and laptop computers), workstation computers, wearable devices, smart screen devices, self-service terminal devices, service robots, gaming systems, thin clients, various messaging devices, sensors or other sensing devices, and so forth. These computer devices may run various types and versions of software applications and operating systems, such as MICROSOFT Windows, APPLE iOS, UNIX-like operating systems, linux, or Linux-like operating systems (e.g., GOOGLE Chrome OS); or include various Mobile operating systems such as MICROSOFT Windows Mobile OS, iOS, windows Phone, android. Portable handheld devices may include cellular telephones, smart phones, tablets, personal Digital Assistants (PDAs), and the like. Wearable devices may include head-mounted displays (such as smart glasses) and other devices. The gaming system may include a variety of handheld gaming devices, internet-enabled gaming devices, and the like. The client device is capable of executing a variety of different applications, such as various Internet-related applications, communication applications (e.g., email applications), short Message Service (SMS) applications, and may use a variety of communication protocols.

Network 110 may be any type of network known to those skilled in the art that may support data communications using any of a variety of available protocols, including but not limited to TCP/IP, SNA, IPX, etc. By way of example only, one or more networks 110 may be a Local Area Network (LAN), an ethernet-based network, a token ring, a Wide Area Network (WAN), the internet, a virtual network, a Virtual Private Network (VPN), an intranet, an extranet, a Public Switched Telephone Network (PSTN), an infrared network, a wireless network (e.g., bluetooth, WIFI), and/or any combination of these and/or other networks.

The server 120 may include one or more general purpose computers, special purpose server computers (e.g., PC (personal computer) servers, UNIX servers, mid-end servers), blade servers, mainframe computers, server clusters, or any other suitable arrangement and/or combination. The server 120 may include one or more virtual machines running a virtual operating system, or other computing architecture involving virtualization (e.g., one or more flexible pools of logical storage that may be virtualized to maintain virtual storage for the server). In various embodiments, the server 120 may run one or more services or software applications that provide the functionality described below.

The computing units in server 120 may run one or more operating systems including any of the operating systems described above, as well as any commercially available server operating systems. The server 120 may also run any of a variety of additional server applications and/or middle tier applications, including HTTP servers, FTP servers, CGI servers, JAVA servers, database servers, and the like.

In some implementations, the server 120 may include one or more applications to analyze and consolidate data feeds and/or event updates received from users of the

client devices

101, 102, 103, 104, 105, and 106. Server 120 may also include one or more applications to display data feeds and/or real-time events via one or more display devices of

client devices

101, 102, 103, 104, 105, and 106.

In some embodiments, the server 120 may be a server of a distributed system, or a server incorporating a blockchain. The server 120 may also be a cloud server, or a smart cloud computing server or a smart cloud host with artificial intelligence technology. The cloud Server is a host product in a cloud computing service system, and is used for solving the defects of high management difficulty and weak service expansibility in the conventional physical host and Virtual Private Server (VPS) service.

The system 100 may also include one or more databases 130. In some embodiments, these databases may be used to store data and other information. For example, one or more of the databases 130 may be used to store information such as audio files and video files. The database 130 may reside in various locations. For example, the data store used by the server 120 may be local to the server 120, or may be remote from the server 120 and may communicate with the server 120 via a network-based or dedicated connection. The database 130 may be of different types. In certain embodiments, the database used by the server 120 may be a database, such as a relational database. One or more of these databases may store, update, and retrieve data to and from the database in response to the command.

In some embodiments, one or more of the databases 130 may also be used by applications to store application data. The databases used by the application may be different types of databases, such as key-value stores, object stores, or regular stores supported by a file system.

The system 100 of fig. 1 may be configured and operated in various ways to enable application of the various methods and apparatus described in accordance with the present disclosure.

According to an aspect of the present disclosure, there is provided a text element extraction method based on natural language understanding. The neural network includes a pre-trained model for natural language processing and a hypernym determination sub-network. As shown in fig. 2, the text element extraction method includes: step S201, determining a target hyponym in a target text; step S202, constructing target input, wherein the target input at least comprises a target text; step S203, processing the target input by using a pre-training model to obtain an intermediate feature, wherein the intermediate feature represents semantic information of a target text and represents at least one of the semantic information of a target hyponym and the position of the target hyponym in the target text; and step S204, utilizing the hypernym determination sub-network to process the intermediate features so as to obtain the hypernym corresponding to the target hyponym.

Therefore, by constructing the model input of the context text comprising the hyponym, the semantic information of the context text can be utilized to better judge which hypernym the hyponym belongs to, and the accuracy of the result output by the model is improved.

The text in the present disclosure may be text of scenes in various fields, such as medical scenes, government scenes, and the like, and is not limited herein. In addition, the method of the present disclosure is used for determining the hypernym corresponding to the target hyponym in the closed domain of the hypernym.

In some embodiments, a plurality of hyponyms may be included in the target text. In step S201, one or more hyponyms may be determined in the target text using the extraction model, and a target hyponym may be determined among the hyponyms. In some embodiments, the extracted hyponyms may be used as target hyponyms one by one and processed by the method of the present disclosure to obtain hypernyms corresponding to the hyponyms.

In constructing the target input of the neural network, it is necessary to take the context information of the target hyponym (i.e., the target text including the target hyponym) as a part of the target input, so that the neural network can output a more accurate hypernym based on the context information of the target hyponym.

As above, the neural network for text element extraction in the present disclosure includes a pre-trained model for natural language processing and a hypernym determination sub-network. The pre-training model for natural language processing may include large-scale models such as ERNIE, BERT, roBERTa, and the like, or may be obtained by self-training of a person skilled in the art according to requirements, and is not limited herein. The hypernym specifying subnetwork may be formed of, for example, a dense layer (dense layer), and has a capability of outputting a hypernym corresponding to a target hyponym based on a feature vector representing information related to the target hyponym.

According to some embodiments, the step S202, constructing the target input may include: and splicing the target hyponym and the target text to obtain target input. Therefore, the target hyponym and the target text are spliced, so that the pre-training model can output and represent the intermediate characteristics of the semantic information of the target text and the semantic information of the target hyponym based on the target input.

According to some embodiments, the target input may include a special symbol for the beginning of a sentence, such as [ CLS ]. The pre-training model may be, for example, a model based on a self-attention mechanism (e.g., a Transformer structure), such that each token (token, which may include participles and special symbols) of the input model is processed and feature vectors corresponding one-to-one to the symbols and participles are output.

In some embodiments, the target input may also include segmentation specific symbols between sentences and at the end of sentences, e.g., [ SEP ]. The target input may be expressed, for example, as:

T＝[CLS][h ₁ ][h ₂ ][...][SEP][t ₁ ][t ₂ ][...][t _n-1 ][t _n ][SEP]

wherein h is _i Individual participles, t, representing target hyponyms _i Each participle representing target text is inserted with special symbol [ CLS ] at corresponding position]，[SEP]。

Step S203, processing the target input by using the pre-training model to obtain the intermediate feature may include: processing the sentence head special symbol included by the target input, at least one first participle included by the target hyponym and at least one second participle included by the target text by utilizing a pre-training model based on a self-attention mechanism to obtain the embedded characteristic of the sentence head special symbol, wherein the intermediate characteristic comprises the embedded characteristic of the sentence head special symbol. Therefore, by using the special symbol of the sentence head, the feature vector which is the comprehensive semantic representation of the spliced text can be conveniently obtained to be input into the following hypernym determination sub-network.

According to some embodiments, the step S203 of processing the target input by using the pre-training model to obtain the intermediate features may include: processing at least one first participle included in the target hyponym and at least one second participle included in the target text by using a pre-training model based on a self-attention mechanism to obtain respective embedding characteristics of the at least one first participle and the at least one second participle; and fusing the respective embedding feature of the at least one first segmentation with the respective embedding feature of the at least one second segmentation to obtain an intermediate feature. Therefore, the feature vectors representing the semantic information of the target text and the target hyponym more directly can be obtained by fusing the embedded features corresponding to the participles, and the model can pay more attention to the participle granularity.

In some embodiments, the respective embedding characteristic of the at least one first participle and the respective embedding characteristic of the at least one second participle may be added to obtain an intermediate characteristic. In addition, on the basis of the embedding characteristics corresponding to the participles, the embedding characteristics corresponding to the special symbols can be fused. It is understood that other fusion methods may be used to fuse the embedded features, such as weighted summation, processing using a multi-layered perceptron, stitching, calculating a hadamard product, or any combination thereof, and is not limited herein.

Fig. 3 shows a schematic diagram of a neural network according to an exemplary embodiment of the present disclosure. As shown in fig. 3, the target input can be constructed by splicing the target hyponym 302 and the target text 304 and inserting special symbols. The pre-training model 306 is used to process the target input, so as to obtain embedded features corresponding to each token, and then the intermediate features obtained based on the embedded features are input into the hypernym determination sub-network 308, so as to obtain the hypernym 310 corresponding to the target hyponym.

According to some embodiments, the target input may also be constructed in other ways. Step S202, constructing the target input may include: inserting a first hyponym special symbol at a first position in the target text, wherein the first position indicates the initial position of the target hyponym in the target text; and inserting a second hyponym special symbol at a second position in the target text, the second position indicating an end position of the target hyponym in the target text. Step S203, processing the target input by using the pre-training model to obtain an intermediate feature includes: processing at least one second word segmentation, a first hyponym special symbol and a second hyponym special symbol included in the target text by using a pre-training model based on an attention mechanism to obtain an embedding characteristic of the first hyponym special symbol and an embedding characteristic of the second hyponym special symbol, wherein the intermediate characteristic includes the embedding characteristic of the first hyponym special symbol and the embedding characteristic of the second hyponym special symbol.

Therefore, the target hyponym is reserved in the target text, and the starting position and the ending position of the target hyponym are marked by using the hyponym special symbol, so that the position relation between the target hyponym and the target text can be obtained, and the accurate hypernym can be obtained by better utilizing the context information of the target hyponym.

In some embodiments, the first hyponym-specific symbol and the second hyponym-specific symbol may both be represented as [ HYP ], and the constructed target input may be:

T＝[CLS][t ₁ ][t ₂ ][...][HYP][t _i ][t _i+1 ][...][HYP][...][t _n-1 ][t _n ][SEP]

wherein, t _i Each participle representing target text also comprises participles of target hyponym, and a special symbol [ CLS ] is inserted in corresponding position]，[SEP]And [ HYP]。

The hyponym special symbol can represent the position information of the hyponym and can also acquire the semantic information of the hyponym to a certain extent, so that the hypernym determination sub-network can be directly utilized to process the intermediate feature based on the embedding feature of the first hyponym special symbol and the embedding feature of the second hyponym special symbol to obtain the hypernym.

In some embodiments, the embedding feature of the first hyponym specific symbol and the embedding feature of the second hyponym specific symbol may also be fused to obtain a hyponym fusion feature. In one exemplary embodiment, the embedding feature of the first hyponym special symbol and the embedding feature of the second hyponym special symbol may be added point-to-point to obtain a hyponym fusion feature. This process can be expressed as: head [ HYP]The vector is E _HYP1 ＝[e ₁₁ ,e ₁₂ ,e ₁₃ ,...,e _1n ]Tail of Chinese character' ao[ HYP ]]The vector is E _HYP2 ＝[e ₂₁ ,e ₂₂ ,e ₂₃ ,...,e _2n ]Lower-level word fusion vector

It is understood that the two embedded vectors are fused, not only ≧ but also other vector operations such as ^ (dot product).

According to some embodiments, the target input may include a sentence start special symbol. Processing at least one second participle, a first hyponym special symbol and a second hyponym special symbol included in the target text by using the pre-training model based on the self-attention mechanism to obtain the embedding characteristics of the first hyponym special symbol and the embedding characteristics of the second hyponym special symbol may include: and processing the sentence head special symbol, at least one second participle included in the target text, the first hyponym special symbol and the second hyponym special symbol by using a pre-training model based on a self-attention mechanism to obtain the embedding characteristics of the sentence head special symbol, the first hyponym special symbol and the second hyponym special symbol. Step S203, processing the target input by using the pre-training model to obtain the intermediate features may further include: and fusing the embedding characteristic of the special symbol of the sentence head, the embedding characteristic of the special symbol of the first hyponym and the embedding characteristic of the special symbol of the second hyponym to obtain an intermediate characteristic. Therefore, the vector representation of the whole target input is obtained by utilizing the sentence head special symbol, and the vector representation of the target input and the embedded characteristics of the two hyponym special symbols are fused, so that the obtained intermediate characteristics can be further strengthened, and the lower hypernym determining sub-network can generate more accurate results.

According to some embodiments, fusing the embedding characteristic of the sentence start special symbol, the embedding characteristic of the first hyponym special symbol, and the embedding characteristic of the second hyponym special symbol to obtain the intermediate characteristic may include: fusing the embedding characteristics of the first hyponym special symbol and the embedding characteristics of the second hyponym special symbol to obtain hyponym fusion characteristics; and processing the embedding characteristic and the hyponym fusion characteristic of the special symbol of the sentence head based on a cross attention mechanism to obtain an intermediate characteristic.

Therefore, by calculating the hyponym fusion characteristics and then further fusing the vector representation representing the whole target input and the hyponym fusion characteristics by using a cross attention mechanism, the finally obtained intermediate characteristics better fuse the vector representation of the hyponym and the vector representation of the target text where the hyponym is located, and the judgment of the upper words of the hyponym is more accurate.

In some embodiments, processing the embedding feature and hyponym fusion feature of the special symbol of the sentence head based on the cross attention mechanism can be expressed as:

wherein, O _attention Representing the resulting intermediate features, E _CLS And d is the dimension of the embedded feature output by the pre-training model.

Fig. 4 shows a schematic diagram of a neural network according to another exemplary embodiment of the present disclosure. As shown in fig. 4, the target input may be constructed by retaining the target hyponym 402 in the target text 404 and inserting special symbols. The pre-training model 406 is used for processing the target input to obtain the embedding characteristics corresponding to each token, the embedding characteristics of the special symbols of the beginning of the sentence and the embedding characteristics of the special symbols of the two hyponyms are selected, the characteristics are fused by using the attention layer 408, the middle characteristics obtained after fusion are input into the hypernym determination sub-network 410, and the hypernym 412 corresponding to the target hyponym can be obtained.

According to some embodiments, the step S204 of processing the intermediate feature by using the hypernym determination sub-network to obtain the hypernym corresponding to the target hyponym may include: processing the intermediate features by utilizing the hypernym determination sub-network to obtain the confidence coefficient that the target hyponym belongs to each preset hypernym in a plurality of preset hypernyms; and determining the hypernym corresponding to the target hyponym in the plurality of preset hypernyms based on the confidence level that the target hyponym belongs to each preset hypernym in the plurality of preset hypernyms. Therefore, the accurate superior word corresponding to the target inferior word can be obtained.

In some embodiments, the hypernym determination sub-network may be represented as:

P＝W*O _attention +B

wherein W is the parameter of the hypernym determination sub-network, B is the bias, P = [ P ] ₁ ,p ₂ ,p ₃ ,…,p _m ]I.e. the probability of m upper-level word classes.

According to another aspect of the present disclosure, there is provided a training method of a neural network including a pre-training model for natural language processing and a hypernym determination sub-network. As shown in fig. 5, the method includes: s501, obtaining a sample text, a sample hyponym in the sample text and a real hypernym of the sample hyponym; step S502, constructing sample input, wherein the sample input at least comprises a sample text; step S503, processing sample input by using a pre-training model to obtain sample intermediate features, wherein the sample intermediate features represent semantic information of a sample text and represent at least one of the semantic information of sample hyponyms and the positions of the sample hyponyms in the sample text; step S504, the hypernym determination sub-network is utilized to process the sample intermediate characteristics so as to obtain predicted hypernyms corresponding to the sample hyponyms; and step S505, adjusting parameters of the neural network based on the real hypernym and the predicted hypernym to obtain the trained neural network. It is understood that the operations of step S501 to step S504 in fig. 5 are similar to the operations of step S201 to step S204 in fig. 2, and are not repeated herein. By training the neural network through the steps, the trained neural network has the capability of outputting accurate hypernyms.

In some embodiments, a person skilled in the art may adjust parameters of the neural network in various ways, for example, a loss function may be predetermined, and a loss value representing a difference between a real hypernym and a predicted hypernym may be calculated by using the loss function, so as to adjust the parameters of the neural network based on the loss value. In addition, in step S505, parameters of the pre-training model and/or hypernym determination sub-network may be adjusted.

According to another aspect of the present disclosure, there is provided an apparatus for natural language understanding-based text element extraction, the neural network including a pre-training model for natural language processing and a hypernym determination sub-network. As shown in fig. 6, the apparatus 600 includes: a determining unit 610 configured to determine a target hyponym in the target text; a first constructing unit 620 configured to construct a target input, the target input including at least a target text; a first processing unit 630 configured to process the target input using a pre-training model to obtain an intermediate feature, wherein the intermediate feature represents semantic information of the target text and represents at least one of semantic information of the target hyponym and a position of the target hyponym in the target text; and a second processing unit 640 configured to process the intermediate features using the hypernym determination sub-network to obtain hypernyms corresponding to the target hyponyms. It is understood that the operations of the units 610-640 in the apparatus 600 are similar to the operations of the steps S201-S204 in fig. 2, and are not described in detail herein.

According to another aspect of the present disclosure, there is provided a training apparatus of a neural network including a pre-training model for natural language processing and a hypernym determination sub-network. As shown in fig. 7, the apparatus 700 includes: an obtaining unit 710 configured to obtain a sample text, a sample hyponym in the sample text, and a real hypernym of the sample hyponym; a second construction unit 720 configured to construct a sample input, the sample input comprising at least sample text; a third processing unit 730 configured to process the sample input using the pre-training model to obtain a sample intermediate feature, wherein the sample intermediate feature represents semantic information of the sample text, and represents at least one of semantic information of a sample hyponym and a position of the sample hyponym in the sample text; a fourth processing unit 740 configured to process the sample intermediate features by using the hypernym determination sub-network to obtain predicted hypernyms corresponding to the sample hyponyms; and a parameter adjusting unit 750 configured to adjust parameters of the neural network based on the real hypernym and the predicted hypernym to obtain a trained neural network.

It is understood that the operations of the units 710 to 750 in the apparatus 700 are similar to the operations of the steps S501 to S504 in fig. 5, and are not described in detail herein.

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.

According to an embodiment of the present disclosure, there is also provided an electronic device, a readable storage medium, and a computer program product.

Referring to fig. 8, a block diagram of a structure of an electronic device 800, which may be a server or a client of the present disclosure, which is an example of a hardware device that may be applied to aspects of the present disclosure, will now be described. Electronic device is intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processors, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 8, the apparatus 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM803, various programs and data necessary for the operation of the device 800 can also be stored. The calculation unit 801, the ROM 802, and the RAM803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

A number of components in the device 800 are connected to the I/O interface 805, including: an input unit 806, an output unit 807, a storage unit 808, and a communication unit 809. The input unit 806 may be any type of device capable of inputting information to the device 800, and the input unit 806 may receive input numeric or character information and generate key signal inputs related to user settings and/or function controls of the electronic device, and may include, but is not limited to, a mouse, a keyboard, a touch screen, a track pad, a track ball, a joystick, a microphone, and/or a remote control. Output unit 807 can be any type of device capable of presenting information and can include, but is not limited to, a display, speakers, a video/audio output terminal, a vibrator, and/or a printer. The storage unit 808 may include, but is not limited to, a magnetic disk or an optical disk. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth ^TM Devices, 802.11 devices, wiFi devices, wiMax devices, cellular communication devices, and/or the like.

Computing unit 801 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning network algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 801 executes the respective methods and processes described above, such as the text element extraction method. For example, in some embodiments, the text element extraction method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 808. In some embodiments, part or all of the computer program can be loaded and/or installed onto device 800 via ROM 802 and/or communications unit 809. When loaded into RAM803 and executed by the computing unit 801, a computer program may perform one or more steps of the text element extraction method described above. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the text element extraction method in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program code, when executed by the processor or controller, causes the functions/acts specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server may be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service extensibility in a traditional physical host and VPS service ("Virtual Private Server", or "VPS" for short). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be performed in parallel, sequentially or in different orders, and are not limited herein as long as the desired results of the technical aspects of the present disclosure can be achieved.

Although embodiments or examples of the present disclosure have been described with reference to the accompanying drawings, it is to be understood that the above-described methods, systems and apparatus are merely exemplary embodiments or examples and that the scope of the present invention is not limited by these embodiments or examples, but only by the claims as issued and their equivalents. Various elements in the embodiments or examples may be omitted or may be replaced with equivalents thereof. Further, the steps may be performed in an order different from that described in the present disclosure. Further, the various elements in the embodiments or examples may be combined in various ways. It is important that as technology evolves, many of the elements described herein may be replaced with equivalent elements that appear after the present disclosure.

Claims

1. A method for text element extraction based on natural language understanding, a neural network including a pre-trained model for natural language processing and a hypernym determination sub-network, the method comprising:

determining a target hyponym in a target text;

constructing a target input, the target input comprising at least the target text;

processing the target input by using the pre-training model to obtain an intermediate feature, wherein the intermediate feature represents semantic information of the target text and at least one of the semantic information of the target hyponym and a position of the target hyponym in the target text; and

and utilizing the hypernym determination sub-network to process the intermediate features so as to obtain the hypernym corresponding to the target hyponym.

2. The method of claim 1, wherein constructing a target input comprises:

and splicing the target hyponym and the target text to obtain the target input.

3. The method of claim 2, wherein the target input includes a sentence start special symbol, and processing the target input using the pre-trained model to derive the intermediate features comprises:

processing the sentence head special symbol included in the target input, at least one first participle included in the target hyponym and at least one second participle included in the target text by utilizing the pre-training model based on a self-attention mechanism to obtain the embedded feature of the sentence head special symbol, wherein the intermediate feature comprises the embedded feature of the sentence head special symbol.

4. The method of claim 2, wherein processing the target input with the pre-trained model to derive intermediate features comprises:

processing at least one first participle included in the target hyponym and at least one second participle included in the target text by utilizing the pre-training model based on a self-attention mechanism to obtain respective embedding characteristics of the at least one first participle and respective embedding characteristics of the at least one second participle; and

fusing the respective embedded features of the at least one first participle with the respective embedded features of the at least one second participle to obtain the intermediate features.

5. The method of claim 1, wherein constructing a target input comprises:

inserting a first hyponym special symbol at a first position in the target text, the first position indicating a starting position of the target hyponym in the target text; and

inserting a second hyponym-specific symbol at a second position in the target text, the second position indicating a termination position of the target hyponym in the target text,

wherein processing the target input using the pre-training model to obtain intermediate features comprises:

processing at least one second participle, the first hyponym special symbol and the second hyponym special symbol included in the target text by using the pre-training model based on a self-attention mechanism to obtain the embedding characteristics of the first hyponym special symbol and the embedding characteristics of the second hyponym special symbol, wherein the intermediate characteristics include the embedding characteristics of the first hyponym special symbol and the embedding characteristics of the second hyponym special symbol.

6. The method of claim 5, wherein the target input comprises a sentence start special symbol, and the processing of the at least one second participle, the first hyponym special symbol, and the second hyponym special symbol comprised by the target text based on the auto-attention mechanism using the pre-trained model to obtain the embedded features of the first hyponym special symbol and the embedded features of the second hyponym special symbol comprises:

processing the sentence head special symbol, at least one second participle included in the target text, the first hyponym special symbol and the second hyponym special symbol by utilizing the pre-training model based on a self-attention mechanism to obtain an embedded characteristic of the sentence head special symbol, an embedded characteristic of the first hyponym special symbol and an embedded characteristic of the second hyponym special symbol,

wherein processing the target input using the pre-training model to obtain intermediate features further comprises:

and fusing the embedding characteristics of the sentence head special symbols, the embedding characteristics of the first hyponym special symbols and the embedding characteristics of the second hyponym special symbols to obtain the intermediate characteristics.

7. The method of claim 6, wherein fusing the embedded features of the sentence start special symbol, the embedded features of the first hyponym special symbol, and the embedded features of the second hyponym special symbol to obtain the intermediate features comprises:

fusing the embedding characteristics of the first hyponym special symbol and the embedding characteristics of the second hyponym special symbol to obtain hyponym fusion characteristics; and

and processing the embedding characteristics of the special symbol of the sentence head and the hyponym fusion characteristics based on a cross attention mechanism to obtain the intermediate characteristics.

8. The method of any of claims 1-7, wherein processing the intermediate features with the hypernym determination sub-network to obtain hypernyms corresponding to the target hyponyms comprises:

processing the intermediate features by utilizing the hypernym determination sub-network to obtain a confidence coefficient that the target hyponym belongs to each preset hypernym in a plurality of preset hypernyms; and

and determining the hypernym corresponding to the target hyponym in the plurality of preset hypernyms based on the confidence level that the target hyponym belongs to each preset hypernym in the plurality of preset hypernyms.

9. A method of training a neural network, the neural network including a pre-trained model for natural language processing and a hypernym determination subnetwork, the method comprising:

acquiring a sample text, sample hyponyms in the sample text and real hypernyms of the sample hyponyms;

constructing a sample input, the sample input comprising at least the sample text;

processing the sample input by using the pre-training model to obtain a sample intermediate feature, wherein the sample intermediate feature represents semantic information of the sample text and at least one of semantic information of the sample hyponym and a position of the sample hyponym in the sample text;

processing the sample intermediate features by utilizing the hypernym determination sub-network to obtain predicted hypernyms corresponding to the sample hyponyms; and

and adjusting parameters of the neural network based on the real hypernym and the predicted hypernym to obtain the trained neural network.

10. A natural language understanding-based text element extraction apparatus, a neural network including a pre-trained model for natural language processing and a hypernym determination sub-network, the apparatus comprising:

a determining unit configured to determine a target hyponym in the target text;

a first construction unit configured to construct a target input, the target input including at least the target text;

a first processing unit configured to process the target input using the pre-training model to obtain an intermediate feature, wherein the intermediate feature characterizes semantic information of the target text and characterizes at least one of semantic information of the target hyponym and a position of the target hyponym in the target text; and

a second processing unit configured to process the intermediate features by using the hypernym determination sub-network to obtain a hypernym corresponding to the target hyponym.

11. A training apparatus of a neural network including a pre-trained model for natural language processing and a hypernym determination subnetwork, the apparatus comprising:

the acquisition unit is configured to acquire a sample text, sample hyponyms in the sample text and real hypernyms of the sample hyponyms;

a second construction unit configured to construct a sample input, the sample input comprising at least the sample text;

a third processing unit configured to process the sample input using the pre-training model to obtain sample intermediate features, wherein the sample intermediate features characterize semantic information of the sample text and characterize at least one of semantic information of the sample hyponyms and positions of the sample hyponyms in the sample text;

a fourth processing unit, configured to process the sample intermediate features by using the hypernym determination sub-network to obtain a predicted hypernym corresponding to the sample hyponym; and

a parameter adjusting unit configured to adjust parameters of the neural network based on the real hypernym and the predicted hypernym to obtain a trained neural network.

12. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.

13. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-9.

14. A computer program product comprising a computer program, wherein the computer program realizes the method of any one of claims 1-9 when executed by a processor.