CN115879468B

CN115879468B - Text element extraction method, device and equipment based on natural language understanding

Info

Publication number: CN115879468B
Application number: CN202211734143.6A
Authority: CN
Inventors: 陈佳颖
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-12-30
Filing date: 2022-12-30
Publication date: 2023-11-14
Anticipated expiration: 2042-12-30
Also published as: CN115879468A

Abstract

The disclosure provides a text element extraction method based on natural language understanding, a training method, a training device and training equipment of a neural network, relates to the field of artificial intelligence, and in particular relates to natural language processing and deep learning technology, and can be applied to smart city and smart government affair scenes. The text element extraction method comprises the following steps: determining a target hyponym in a target text; constructing a target input, wherein the target input at least comprises a target text; processing the target input by utilizing the pre-training model to obtain intermediate features, wherein the intermediate features represent semantic information of the target text and at least one of semantic information of the target hyponym and the position of the target hyponym in the target text; and processing the intermediate features by utilizing the hypernym determining sub-network to obtain the hypernym corresponding to the target hyponym.

Description

Text element extraction method, device and equipment based on natural language understanding

Technical Field

The disclosure relates to the field of artificial intelligence, in particular to natural language processing and deep learning technology, which can be applied to smart cities and smart government scenes, and particularly relates to a text element extraction method based on natural language understanding, a training method of a neural network, a text element extraction device based on natural language understanding, a training device of the neural network, electronic equipment, a computer readable storage medium and a computer program product.

Background

Artificial intelligence is the discipline of studying the process of making a computer mimic certain mental processes and intelligent behaviors (e.g., learning, reasoning, thinking, planning, etc.) of a person, both hardware-level and software-level techniques. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, a machine learning/deep learning technology, a big data processing technology, a knowledge graph technology and the like.

When processing a large amount of text content in the same field, related personnel need to look through the text content, and then manually screen out required core attention text segments, such as current medical history, symptoms, prescriptions and the like in cases. However, even if these core elements are manually extracted, the description form of the text content is diversified for the elements such as "symptoms", and it is difficult for the related personnel to find the connection of the "symptoms" of each case among thousands of cases, but if the diversified detailed description of the "symptoms" can be extended to wider subject terms (i.e., upper words), the related personnel can screen or search for the cases of the same "symptoms" from the upper words more easily.

The approaches described in this section are not necessarily approaches that have been previously conceived or pursued. Unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, the problems mentioned in this section should not be considered as having been recognized in any prior art unless otherwise indicated.

Disclosure of Invention

The present disclosure provides a text element extraction method based on natural language understanding, a training method of a neural network, a text element extraction device based on natural language understanding, a training device of a neural network, an electronic device, a computer-readable storage medium, and a computer program product.

According to an aspect of the present disclosure, there is provided a text element extraction method based on natural language understanding, a neural network including a pre-training model for natural language processing and a superword determination sub-network. The method comprises the following steps: determining a target hyponym in a target text; constructing a target input, wherein the target input at least comprises a target text; processing the target input by utilizing the pre-training model to obtain intermediate features, wherein the intermediate features represent semantic information of the target text and at least one of semantic information of the target hyponym and the position of the target hyponym in the target text; and processing the intermediate features by utilizing the hypernym determining sub-network to obtain the hypernym corresponding to the target hyponym.

According to an aspect of the present disclosure, there is provided a training method of a neural network including a pre-training model for natural language processing and a superword determination sub-network. The method comprises the following steps: acquiring a sample text, a sample hyponym in the sample text and a real hypernym of the sample hyponym; constructing a sample input, the sample input including at least sample text; processing the sample input by utilizing a pre-training model to obtain sample intermediate features, wherein the sample intermediate features characterize semantic information of a sample text and at least one of the semantic information of a sample hyponym and the position of the sample hyponym in the sample text; processing the sample intermediate features by utilizing the hypernym determining sub-network to obtain a predicted hypernym corresponding to the sample hyponym; and adjusting parameters of the neural network based on the real upper words and the predicted upper words to obtain the trained neural network.

According to another aspect of the present disclosure, there is provided a text element extraction apparatus based on natural language understanding, a neural network including a pre-training model for natural language processing and a superword determination sub-network, the apparatus including: a determining unit configured to determine a target hyponym in a target text; a first construction unit configured to construct a target input including at least a target text; a first processing unit configured to process the target input with the pre-training model to obtain intermediate features, wherein the intermediate features characterize semantic information of the target text and at least one of semantic information of the target hyponym and a position of the target hyponym in the target text; and the second processing unit is configured to process the intermediate features by utilizing the hypernym determining sub-network so as to obtain the hypernym corresponding to the target hyponym.

According to another aspect of the present disclosure, there is provided a training apparatus of a neural network including a pre-training model for natural language processing and a hypernym determination sub-network, the apparatus comprising: the acquisition unit is configured to acquire the sample text, the sample hyponyms in the sample text and the real hypernyms of the sample hyponyms; a second construction unit configured to construct a sample input, the sample input including at least a sample text; a third processing unit configured to process the sample input with the pre-training model to obtain sample intermediate features, wherein the sample intermediate features characterize semantic information of the sample text and at least one of semantic information of the sample hyponym and a position of the sample hyponym in the sample text; the fourth processing unit is configured to process the sample middle characteristics by utilizing the upper word determining sub-network so as to obtain a predicted upper word corresponding to the sample lower word; and the parameter adjusting unit is configured to adjust parameters of the neural network based on the real upper words and the predicted upper words so as to obtain the trained neural network.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method described above.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the above-described method.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program, wherein the computer program, when executed by a processor, implements the above-described method.

According to one or more embodiments of the present disclosure, by constructing a model input of a context text including a hyponym, semantic information of the context text can be used to better determine to which hypernym the hyponym belongs, so that accuracy of a result output by the model is improved.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The accompanying drawings illustrate exemplary embodiments and, together with the description, serve to explain exemplary implementations of the embodiments. The illustrated embodiments are for exemplary purposes only and do not limit the scope of the claims. Throughout the drawings, identical reference numerals designate similar, but not necessarily identical, elements.

FIG. 1 illustrates a schematic diagram of an exemplary system in which various methods described herein may be implemented, in accordance with an embodiment of the present disclosure;

FIG. 2 illustrates a flow chart of a text element extraction method according to an exemplary embodiment of the present disclosure;

FIG. 3 illustrates a schematic diagram of a neural network, according to an exemplary embodiment of the present disclosure;

FIG. 4 shows a schematic diagram of a neural network, according to an exemplary embodiment of the present disclosure;

FIG. 5 illustrates a flowchart of a method of training a neural network, according to an exemplary embodiment of the present disclosure;

fig. 6 illustrates a block diagram of a text element extraction apparatus according to an exemplary embodiment of the present disclosure;

FIG. 7 illustrates a block diagram of a training apparatus of a neural network, according to an exemplary embodiment of the present disclosure; and

fig. 8 illustrates a block diagram of an exemplary electronic device that can be used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the present disclosure, the use of the terms "first," "second," and the like to describe various elements is not intended to limit the positional relationship, timing relationship, or importance relationship of the elements, unless otherwise indicated, and such terms are merely used to distinguish one element from another. In some examples, a first element and a second element may refer to the same instance of the element, and in some cases, they may also refer to different instances based on the description of the context.

The terminology used in the description of the various illustrated examples in this disclosure is for the purpose of describing particular examples only and is not intended to be limiting. Unless the context clearly indicates otherwise, the elements may be one or more if the number of the elements is not specifically limited. Furthermore, the term "and/or" as used in this disclosure encompasses any and all possible combinations of the listed items.

The hypernym determination methods in the related art can be classified into three types: pattern matching-based methods, statistical-based methods, and neural network model-based methods (e.g., a model of the generation based on probt). However, the first method cannot capture semantic information, and needs to manually customize templates, so that limitation is high; the second method can not capture semantic information, and needs to manually set a threshold value, but setting the threshold value is difficult; the third method is difficult to specify the campt, and the corresponding hypernym is difficult to judge aiming at the hyponym with a more fuzzy meaning.

In order to solve the problems, the method and the device for outputting the context text by the aid of the context text can utilize semantic information of the context text to better judge which upper level word the context word belongs to by constructing the model input of the context text comprising the lower level word, and accuracy of a result output by the model is improved.

Embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.

Fig. 1 illustrates a schematic diagram of an exemplary system 100 in which various methods and apparatus described herein may be implemented, in accordance with an embodiment of the present disclosure. Referring to fig. 1, the system 100 includes one or more client devices 101, 102, 103, 104, 105, and 106, a server 120, and one or more communication networks 110 coupling the one or more client devices to the server 120. Client devices 101, 102, 103, 104, 105, and 106 may be configured to execute one or more applications.

In embodiments of the present disclosure, server 120 may run one or more services or software applications that enable methods of text element extraction to be performed.

In some embodiments, server 120 may also provide other services or software applications that may include non-virtual environments and virtual environments. In some embodiments, these services may be provided as web-based services or cloud services, such as provided to users of client devices 101, 102, 103, 104, 105, and/or 106 under a software as a service (SaaS) network.

In the configuration shown in fig. 1, server 120 may include one or more components that implement the functions performed by server 120. These components may include software components, hardware components, or a combination thereof that are executable by one or more processors. A user operating client devices 101, 102, 103, 104, 105, and/or 106 may in turn utilize one or more client applications to interact with server 120 to utilize the services provided by these components. It should be appreciated that a variety of different system configurations are possible, which may differ from system 100. Accordingly, FIG. 1 is one example of a system for implementing the various methods described herein and is not intended to be limiting.

The user may use client devices 101, 102, 103, 104, 105, and/or 106 for human-machine interaction. The client device may provide an interface that enables a user of the client device to interact with the client device. The client device may also output information to the user via the interface. Although fig. 1 depicts only six client devices, those skilled in the art will appreciate that the present disclosure may support any number of client devices.

Client devices 101, 102, 103, 104, 105, and/or 106 may include various types of computer devices, such as portable handheld devices, general purpose computers (such as personal computers and laptop computers), workstation computers, wearable devices, smart screen devices, self-service terminal devices, service robots, gaming systems, thin clients, various messaging devices, sensors or other sensing devices, and the like. These computer devices may run various types and versions of software applications and operating systems, such as MICROSOFT Windows, APPLE iOS, UNIX-like operating systems, linux, or Linux-like operating systems (e.g., GOOGLE Chrome OS); or include various mobile operating systems such as MICROSOFT Windows Mobile OS, iOS, windows Phone, android. Portable handheld devices may include cellular telephones, smart phones, tablet computers, personal Digital Assistants (PDAs), and the like. Wearable devices may include head mounted displays (such as smart glasses) and other devices. The gaming system may include various handheld gaming devices, internet-enabled gaming devices, and the like. The client device is capable of executing a variety of different applications, such as various Internet-related applications, communication applications (e.g., email applications), short Message Service (SMS) applications, and may use a variety of communication protocols.

Network 110 may be any type of network known to those skilled in the art that may support data communications using any of a number of available protocols, including but not limited to TCP/IP, SNA, IPX, etc. For example only, the one or more networks 110 may be a Local Area Network (LAN), an ethernet-based network, a token ring, a Wide Area Network (WAN), the internet, a virtual network, a Virtual Private Network (VPN), an intranet, an extranet, a Public Switched Telephone Network (PSTN), an infrared network, a wireless network (e.g., bluetooth, WIFI), and/or any combination of these and/or other networks.

The server 120 may include one or more general purpose computers, special purpose server computers (e.g., PC (personal computer) servers, UNIX servers, mid-end servers), blade servers, mainframe computers, server clusters, or any other suitable arrangement and/or combination. The server 120 may include one or more virtual machines running a virtual operating system, or other computing architecture that involves virtualization (e.g., one or more flexible pools of logical storage devices that may be virtualized to maintain virtual storage devices of the server). In various embodiments, server 120 may run one or more services or software applications that provide the functionality described below.

The computing units in server 120 may run one or more operating systems including any of the operating systems described above as well as any commercially available server operating systems. Server 120 may also run any of a variety of additional server applications and/or middle tier applications, including HTTP servers, FTP servers, CGI servers, JAVA servers, database servers, etc.

In some implementations, server 120 may include one or more applications to analyze and consolidate data feeds and/or event updates received from users of client devices 101, 102, 103, 104, 105, and 106. Server 120 may also include one or more applications to display data feeds and/or real-time events via one or more display devices of client devices 101, 102, 103, 104, 105, and 106.

In some implementations, the server 120 may be a server of a distributed system or a server that incorporates a blockchain. The server 120 may also be a cloud server, or an intelligent cloud computing server or intelligent cloud host with artificial intelligence technology. The cloud server is a host product in a cloud computing service system, so as to solve the defects of large management difficulty and weak service expansibility in the traditional physical host and virtual private server (VPS, virtual Private Server) service.

The system 100 may also include one or more databases 130. In some embodiments, these databases may be used to store data and other information. For example, one or more of databases 130 may be used to store information such as audio files and video files. Database 130 may reside in various locations. For example, the data store used by the server 120 may be local to the server 120, or may be remote from the server 120 and may communicate with the server 120 via a network-based or dedicated connection. Database 130 may be of different types. In some embodiments, the database used by server 120 may be a database, such as a relational database. One or more of these databases may store, update, and retrieve the databases and data from the databases in response to the commands.

In some embodiments, one or more of databases 130 may also be used by applications to store application data. The databases used by the application may be different types of databases, such as key value stores, object stores, or conventional stores supported by the file system.

The system 100 of fig. 1 may be configured and operated in various ways to enable application of the various methods and apparatus described in accordance with the present disclosure.

According to an aspect of the present disclosure, a text element extraction method based on natural language understanding is provided. The neural network includes a pre-training model for natural language processing and a hypernym determination sub-network. As shown in fig. 2, the text element extraction method includes: step S201, determining a target hyponym in a target text; s202, constructing target input, wherein the target input at least comprises target text; step S203, processing the target input by utilizing a pre-training model to obtain intermediate features, wherein the intermediate features represent semantic information of the target text and at least one of semantic information of the target hyponym and the position of the target hyponym in the target text; and step S204, processing the intermediate features by utilizing the hypernym determining sub-network to obtain the hypernym corresponding to the target hyponym.

Therefore, by constructing the model input of the context text comprising the hyponym, semantic information of the context text can be utilized to better judge which hypernym the hyponym belongs to, and accuracy of a result output by the model is improved.

The text in the present disclosure may be text of scenes in various fields, such as medical scenes, government scenes, etc., which are not limited herein. In addition, the method disclosed by the invention is used for determining the hypernym corresponding to the target hyponym in the closed domain of the hypernym.

In some embodiments, multiple hyponyms may be included in the target text. In step S201, one or more hyponyms may be determined in the target text using the extraction model, and the target hyponym may be determined among the hyponyms. In some embodiments, the extracted hyponyms may be used as target hyponyms one by one and processed using the method of the present disclosure to obtain hypernyms corresponding to the hyponyms.

In constructing the target input of the neural network, context information of the target hyponym (i.e., target text including the target hyponym) needs to be part of the target input to enable the neural network to output a more accurate hypernym based on the context information of the target hyponym.

As above, the neural network for text element extraction in the present disclosure includes a pre-training model for natural language processing and a superword determination sub-network. The pre-training model for natural language processing may include, for example, a large-scale model such as ERNIE, BERT, roBERTa, or may be obtained by self-training by those skilled in the art according to requirements, which is not limited herein. The hypernym determination sub-network may be constituted by, for example, a dense layer (dense layer), and has the capability of outputting a hypernym corresponding to a target hyponym based on a feature vector representing information related to the target hyponym.

According to some embodiments, step S202, constructing the target input may include: and splicing the target hyponym and the target text to obtain target input. Therefore, the target hyponym and the target text are spliced, so that the pre-training model can output intermediate features representing semantic information of the target text and semantic information of the target hyponym based on the target input.

According to some embodiments, the target input may include a special symbol of a sentence head, such as [ CLS ]. The pre-training model may be, for example, a model based on a self-attention mechanism (e.g., a transducer structure) such that each token (token, which may include a word and a special symbol) of the input model is processed and feature vectors corresponding to the symbols and the word one-to-one are output.

In some embodiments, the target input may also include split special symbols, such as [ SEP ], between sentences and at the end of sentences. The target input may be expressed, for example, as:

T＝[CLS][h ₁ ][h ₂ ][...][SEP][t ₁ ][t ₂ ][...][t _n-1 ][t _n ][SEP]

wherein h is _i Each word segment representing a target hyponym, t _i Representing each of the target textsThe words are separated, and special symbols [ CLS ] are inserted into the corresponding positions]，[SEP]。

Step S203, processing the target input by using the pre-training model to obtain the intermediate feature may include: and processing the special symbol of the sentence head, at least one first word included in the target hyponym and at least one second word included in the target text based on a self-attention mechanism by utilizing a pre-training model to obtain the embedded feature of the special symbol of the sentence head, wherein the intermediate feature comprises the embedded feature of the special symbol of the sentence head. Therefore, by using the special symbol of the sentence head, the feature vector which is the comprehensive semantic representation of the spliced text can be conveniently obtained to be input into a subsequent superword determining sub-network.

According to some embodiments, step S203 of processing the target input using the pre-training model to obtain the intermediate feature may include: processing at least one first word segment included in the target hyponym and at least one second word segment included in the target text based on a self-attention mechanism by utilizing a pre-training model to obtain respective embedded features of the at least one first word segment and respective embedded features of the at least one second word segment; and fusing the embedded features of the at least one first word and the embedded features of the at least one second word to obtain intermediate features. Therefore, the feature vector which more directly characterizes the semantic information of the target text and the target hyponym can be obtained by fusing the embedded features corresponding to each word, and the model can pay more attention to the granularity of the word.

In some embodiments, the embedded feature of each of the at least one first word and the embedded feature of each of the at least one second word may be added to obtain the intermediate feature. In addition, on the basis of the embedded features corresponding to the segmentation, the embedded features corresponding to the special symbols can be fused. It will be appreciated that other fusion means may be used to fuse the embedded features, such as weighted summation, processing with a multi-layer perceptron, stitching, computing Hadamard products, or any combination of these, without limitation.

Fig. 3 shows a schematic diagram of a neural network according to an exemplary embodiment of the present disclosure. As shown in fig. 3, a target input may be constructed by concatenating the target hyponym 302 and the target text 304 and inserting special symbols. Processing the target input by using the pre-training model 306 can obtain the embedded features corresponding to each token, and then inputting the intermediate features obtained based on the embedded features into the superword determining sub-network 308, so as to obtain the superword 310 corresponding to the target hyponym.

According to some embodiments, the target input may also be structured in other ways. Step S202, constructing the target input may include: inserting a first hyponym special symbol at a first position in the target text, wherein the first position indicates the starting position of the target hyponym in the target text; and inserting a second hyponym special symbol at a second location in the target text, the second location indicating a termination location of the target hyponym in the target text. Step S203, processing the target input by using the pre-training model to obtain intermediate features includes: and processing at least one second word segmentation, the first hyponym special symbol and the second hyponym special symbol included in the target text based on a self-attention mechanism by utilizing a pre-training model to obtain the embedded features of the first hyponym special symbol and the embedded features of the second hyponym special symbol, wherein the intermediate features comprise the embedded features of the first hyponym special symbol and the embedded features of the second hyponym special symbol.

Therefore, the target hyponym is reserved in the target text, and the starting position and the ending position of the target hyponym are marked by using the hyponym special symbol, so that the position relation between the target hyponym and the target text can be acquired, and the accurate hypernym can be obtained by better utilizing the context information of the target hyponym.

In some embodiments, the first hyponym special symbol and the second hyponym special symbol may both be represented as [ HYP ], and the structured target input may be:

T＝[CLS][t ₁ ][t ₂ ][...][HYP][t _i ][t _i+1 ][...][HYP][...][t _n-1 ][t _n ][SEP]

wherein t is _i Each word segment representing the target text, wherein the word segment also comprises the target hyponym, and a special symbol [ CLS ] is inserted into the corresponding position]，[SEP]And [ HYP ]]。

The special sign of the hyponym can characterize the position information of the hyponym and acquire semantic information of the hyponym to a certain extent, so that the intermediate features based on the embedded features of the special sign of the first hyponym and the embedded features of the special sign of the second hyponym can be processed directly by utilizing the upper-level word determining sub-network to obtain the upper-level word.

In some embodiments, the embedded feature of the first hyponym special symbol and the embedded feature of the second hyponym special symbol may also be fused to obtain a hyponym fusion feature. In one exemplary embodiment, the embedded feature of the first hyponym special symbol and the embedded feature of the second hyponym special symbol may be added point-to-point to obtain a hyponym fusion feature. This process can be expressed as: head [ HYP ] ]Vector is E _HYP1 ＝[e ₁₁ ,e ₁₂ ,e ₁₃ ,...,e _1n ]Tail [ HYP ]]Vector is E _HYP2 ＝[e ₂₁ ,e ₂₂ ,e ₂₃ ,...,e _2n ]Hyponym fusion vector It will be appreciated that the fusion of these two embedded vectors is not limited to ∈but may be performed by other vector operations such as ∈and the like.

According to some embodiments, the target input may include a special symbol of a sentence head. Processing, by the pre-training model, at least one second word segment, the first hyponym special symbol, and the second hyponym special symbol included in the target text based on the self-attention mechanism, to obtain an embedded feature of the first hyponym special symbol and an embedded feature of the second hyponym special symbol may include: and processing the sentence head special symbol, at least one second segmentation word, the first hyponym special symbol and the second hyponym special symbol included in the target text by utilizing the pre-training model based on a self-attention mechanism so as to obtain the embedding characteristics of the sentence head special symbol, the embedding characteristics of the first hyponym special symbol and the embedding characteristics of the second hyponym special symbol. Step S203, processing the target input by using the pre-training model to obtain the intermediate feature may further include: and fusing the embedded features of the sentence head special symbol, the embedded features of the first hyponym special symbol and the embedded features of the second hyponym special symbol to obtain intermediate features. Therefore, the vector representation of the whole target input is obtained by using the special symbol of the sentence head, and the vector representation of the target input and the embedded features of the two special symbols of the hyponym are fused, so that the obtained intermediate features can be further enhanced, and the determination sub-network of the hyponym in the downstream can generate more accurate results.

According to some embodiments, fusing the embedded feature of the sentence head special symbol, the embedded feature of the first hyponym special symbol, and the embedded feature of the second hyponym special symbol to obtain the intermediate feature may include: fusing the embedded features of the first hyponym special symbol and the embedded features of the second hyponym special symbol to obtain hyponym fusion features; and processing the embedded features of the special symbols of the sentential period and the fusion features of the hyponyms based on a cross attention mechanism to obtain intermediate features.

Therefore, the vector representation representing the whole target input and the hyponym fusion feature are further fused by firstly calculating the hyponym fusion feature and then using a cross attention mechanism, so that the finally obtained intermediate feature better fuses the vector representation of the hyponym and the vector representation of the target text of the hyponym, and the judgment of the hypernym of the hyponym is more accurate.

In some embodiments, processing the embedded feature of the special symbol of the sentence head and the fusion feature of the hyponym based on the cross-attention mechanism may be expressed as:

wherein O is _attention Representing the resulting intermediate features, E _CLS And d is the dimension of the embedded feature output by the pre-training model.

Fig. 4 shows a schematic diagram of a neural network according to another exemplary embodiment of the present disclosure. As shown in fig. 4, a target input may be constructed by retaining the target hyponym 402 in the target text 404 and inserting special symbols. Processing the target input by using the pre-training model 406 to obtain the embedded feature corresponding to each token, further selecting the embedded feature of the special symbol of the sentence head and the embedded features of the special symbols of the two hyponyms, performing feature fusion by using the attention layer 408, and inputting the intermediate features obtained after fusion into the hypernym determination sub-network 410 to obtain the hypernym 412 corresponding to the target hyponym.

According to some embodiments, step S204, processing the intermediate feature by using the hypernym determining sub-network to obtain the hypernym corresponding to the target hyponym may include: processing the intermediate features by utilizing the hypernym determining sub-network to obtain the confidence coefficient of each preset hypernym of the target hyponym belonging to the plurality of preset hypernyms; and determining the hypernym corresponding to the target hyponym from the plurality of preset hypernyms based on the confidence that the target hyponym belongs to each of the plurality of preset hypernyms. Thus, the hypernym corresponding to the target hyponym can be obtained accurately.

In some embodiments, the hypernym determination sub-network may be expressed as:

P＝W*O _attention +B

wherein W is the parameter of the upper word determination sub-network, B is the bias, and P= [ P ] ₁ ,p ₂ ,p ₃ ,…,p _m ]I.e. the probability of m hypernym categories.

According to another aspect of the present disclosure, a training method of a neural network is provided, the neural network including a pre-training model for natural language processing and a hypernym determination sub-network. As shown in fig. 5, the method includes: step S501, acquiring a sample text, a sample hyponym in the sample text and a real hypernym of the sample hyponym; step S502, constructing sample input, wherein the sample input at least comprises sample text; step S503, processing sample input by utilizing a pre-training model to obtain sample intermediate features, wherein the sample intermediate features represent semantic information of sample texts and at least one of the semantic information of sample hyponyms and the positions of the sample hyponyms in the sample texts; step S504, processing the sample intermediate features by utilizing the hypernym determining sub-network to obtain a predicted hypernym corresponding to the sample hyponym; and step S505, adjusting parameters of the neural network based on the real upper words and the predicted upper words to obtain the trained neural network. It is understood that the operations of step S501 to step S504 in fig. 5 are similar to those of step S201 to step S204 in fig. 2, and are not described herein. By training the neural network through the steps, the trained neural network can have the capability of outputting accurate upper words.

In some embodiments, one skilled in the art may adjust the parameters of the neural network in various ways, for example, a loss function may be predetermined, and a loss value characterizing the difference between the real hypernym and the predicted hypernym is calculated using the loss function, and the parameters of the neural network are adjusted based on the loss value. In addition, in step S505, parameters of the pre-training model and/or the superordinate word determination sub-network may be adjusted.

According to another aspect of the present disclosure, there is provided an apparatus for text element extraction based on natural language understanding, a neural network including a pre-training model for natural language processing and a superword determination sub-network. As shown in fig. 6, the apparatus 600 includes: a determining unit 610 configured to determine a target hyponym in a target text; a first construction unit 620 configured to construct a target input including at least a target text; a first processing unit 630 configured to process the target input with the pre-training model to obtain intermediate features, wherein the intermediate features characterize semantic information of the target text and at least one of semantic information of the target hyponym and a position of the target hyponym in the target text; and a second processing unit 640 configured to process the intermediate feature using the hypernym determination sub-network to obtain a hypernym corresponding to the target hyponym. It will be appreciated that the operations of the units 610-640 in the apparatus 600 are similar to those of the steps S201-S204 in fig. 2, and are not described herein.

According to another aspect of the present disclosure, a training apparatus for a neural network is provided, the neural network including a pre-training model for natural language processing and a hypernym determination sub-network. As shown in fig. 7, the apparatus 700 includes: an obtaining unit 710 configured to obtain a sample text, a sample hyponym in the sample text, and a real hypernym of the sample hyponym; a second construction unit 720 configured to construct a sample input, the sample input comprising at least a sample text; a third processing unit 730 configured to process the sample input with the pre-training model to obtain sample intermediate features, wherein the sample intermediate features characterize semantic information of the sample text and at least one of semantic information of the sample hyponym and a position of the sample hyponym in the sample text; a fourth processing unit 740 configured to process the sample intermediate features by using the hypernym determination sub-network to obtain predicted hypernyms corresponding to the sample hyponyms; and a parameter tuning unit 750 configured to adjust parameters of the neural network based on the real upper word and the predicted upper word to obtain a trained neural network.

It is understood that the operations of the units 710 to 750 in the apparatus 700 are similar to those of the steps S501 to S504 in fig. 5, and are not described herein.

In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order colloquial is not violated.

According to embodiments of the present disclosure, there is also provided an electronic device, a readable storage medium and a computer program product.

Referring to fig. 8, a block diagram of an electronic device 800 that may be a server or a client of the present disclosure, which is an example of a hardware device that may be applied to aspects of the present disclosure, will now be described. Electronic devices are intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 8, the apparatus 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM803, various programs and data required for the operation of the device 800 can also be stored. The computing unit 801, the ROM 802, and the RAM803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804.

Various components in device 800 are connected to I/O interface 805, including: an input unit 806, an output unit 807, a storage unit 808, and a communication unit 809. The input unit 806 may be any type of device capable of inputting information to the device 800, the input unit 806 may receive input numeric or character information and generate key signal inputs related to user settings and/or function control of the electronic device, and may include, but is not limited to, a mouse, a keyboard, a touch screen, a trackpad, a trackball, a joystick, a microphone, and/or a remote control. The output unit 807 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, video/audio output terminals, vibrators, and/or printers. The storage unit 808 may include, but is not limited to, magnetic disks, optical disks. Communication unit 809 allows device 800 to exchange information with other devices via a computer network, such as the internet, and/or various telecommunication networks Data, and may include, but is not limited to, a modem, a network card, an infrared communication device, a wireless communication transceiver, and/or a chipset, such as Bluetooth ^TM Devices, 802.11 devices, wiFi devices, wiMax devices, cellular communication devices, and/or the like.

The computing unit 801 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning network algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The calculation unit 801 performs the respective methods and processes described above, for example, a text element extraction method. For example, in some embodiments, the text element extraction method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 808. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 800 via ROM 802 and/or communication unit 809. When a computer program is loaded into the RAM 803 and executed by the computing unit 801, one or more steps of the text element extraction method described above may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the text element extraction method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual Private Server" or simply "VPS") are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.

Although embodiments or examples of the present disclosure have been described with reference to the accompanying drawings, it is to be understood that the foregoing methods, systems, and apparatus are merely exemplary embodiments or examples, and that the scope of the present invention is not limited by these embodiments or examples but only by the claims following the grant and their equivalents. Various elements of the embodiments or examples may be omitted or replaced with equivalent elements thereof. Furthermore, the steps may be performed in a different order than described in the present disclosure. Further, various elements of the embodiments or examples may be combined in various ways. It is important that as technology evolves, many of the elements described herein may be replaced by equivalent elements that appear after the disclosure.

Claims

1. A text element extraction method based on natural language understanding, a neural network including a pre-training model for natural language processing and a superordinate word determination sub-network, the method comprising:

Determining a target hyponym in a target text;

constructing a target input, comprising:

inserting a first hyponym special symbol at a first position in the target text, wherein the first position indicates the starting position of the target hyponym in the target text; and

inserting a second hyponym special symbol at a second location in the target text, the second location indicating a termination location of the target hyponym in the target text, wherein the target input includes at least the target text;

processing the target input using the pre-training model to obtain intermediate features, including:

processing at least one second segmentation word, the first hyponym special symbol and the second hyponym special symbol included in the target text based on a self-attention mechanism by utilizing the pre-training model to obtain embedded features of the first hyponym special symbol and embedded features of the second hyponym special symbol, wherein the intermediate features comprise the embedded features of the first hyponym special symbol and the embedded features of the second hyponym special symbol, represent semantic information of the target text, and represent at least one of semantic information of the target hyponym and positions of the target hyponym in the target text; and

And processing the intermediate features by using the hypernym determining sub-network to obtain the hypernym corresponding to the target hyponym.

2. The method of claim 1, wherein the target input includes a sentence head special symbol, and processing at least one second segmentation word, the first hyponym special symbol, and the second hyponym special symbol included in the target text based on a self-attention mechanism using the pre-training model to obtain an embedded feature of the first hyponym special symbol and an embedded feature of the second hyponym special symbol comprises:

processing the sentence head special symbol, at least one second segmentation word included in the target text, the first hyponym special symbol and the second hyponym special symbol based on a self-attention mechanism by utilizing the pre-training model to obtain an embedded feature of the sentence head special symbol, an embedded feature of the first hyponym special symbol and an embedded feature of the second hyponym special symbol,

wherein processing the target input using the pre-training model to obtain intermediate features further comprises:

and fusing the embedded features of the sentence head special symbol, the embedded features of the first hyponym special symbol and the embedded features of the second hyponym special symbol to obtain the intermediate feature.

3. The method of claim 2, wherein fusing the embedded feature of the sentence head special symbol, the embedded feature of the first hyponym special symbol, and the embedded feature of the second hyponym special symbol to obtain the intermediate feature comprises:

fusing the embedded features of the first hyponym special symbol and the embedded features of the second hyponym special symbol to obtain hyponym fusion features; and

and processing the embedded features of the sentence head special symbol and the hyponym fusion features based on a cross attention mechanism to obtain the intermediate features.

4. A method according to any one of claims 1-3, wherein processing the intermediate feature with the hypernym determination sub-network to obtain a hypernym corresponding to the target hyponym comprises:

processing the intermediate features by using the hypernym determining sub-network to obtain the confidence coefficient of each preset hypernym of the target hyponym belonging to a plurality of preset hypernyms; and

and determining the hypernym corresponding to the target hyponym from the preset hypernyms based on the confidence that the target hyponym belongs to each preset hypernym in the preset hypernyms.

5. A method of training a neural network comprising a pre-training model for natural language processing and a hypernym determination sub-network, the method comprising:

acquiring a sample text, a sample hyponym in the sample text and a real hypernym of the sample hyponym;

constructing a sample input, comprising:

inserting a first hyponym special symbol at a first position in the sample text, the first position indicating a starting position of the sample hyponym in the sample text; and

inserting a second hyponym special symbol at a second location in the sample text, the second location indicating a termination location of the sample hyponym in the sample text, wherein the sample input includes at least the sample text;

processing the sample input using the pre-training model to obtain sample intermediate features, including:

processing at least one second segmentation word, the first hyponym special symbol and the second hyponym special symbol included in the sample text based on a self-attention mechanism by utilizing the pre-training model to obtain embedded features of the first hyponym special symbol and embedded features of the second hyponym special symbol, wherein the sample intermediate features comprise the embedded features of the first hyponym special symbol and the embedded features of the second hyponym special symbol, the sample intermediate features represent semantic information of the sample text, and represent at least one of the semantic information of the sample hyponym and the position of the sample hyponym in the sample text;

Processing the sample intermediate features by using the upper word determining sub-network to obtain a predicted upper word corresponding to the sample lower word; and

and adjusting parameters of the neural network based on the real hypernym and the predicted hypernym to obtain the trained neural network.

6. A text element extraction apparatus based on natural language understanding, a neural network including a pre-training model for natural language processing and a superordinate word determination sub-network, the apparatus comprising:

a determining unit configured to determine a target hyponym in a target text;

a first construction unit configured to construct a target input, comprising:

a first processing unit configured to process the target input using the pre-training model to obtain an intermediate feature, comprising:

and the second processing unit is configured to process the intermediate features by utilizing the hypernym determining sub-network so as to obtain the hypernym corresponding to the target hyponym.

7. A training apparatus for a neural network comprising a pre-training model for natural language processing and a hypernym determination sub-network, the apparatus comprising:

the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is configured to acquire a sample text, a sample hyponym in the sample text and a real hypernym of the sample hyponym;

A second construction unit configured to construct a sample input, comprising:

a third processing unit configured to process the sample input using the pre-training model to obtain sample intermediate features, comprising:

A fourth processing unit configured to process the sample intermediate feature by using the hypernym determination sub-network to obtain a predicted hypernym corresponding to the sample hyponym; and

and the parameter adjusting unit is configured to adjust parameters of the neural network based on the real hypernym and the predicted hypernym so as to obtain the trained neural network.

8. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the method comprises the steps of

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.

9. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-5.