CN116956927A - Method and system for identifying named entities of bankruptcy document - Google Patents

Method and system for identifying named entities of bankruptcy document Download PDF

Info

Publication number
CN116956927A
CN116956927A CN202310949107.XA CN202310949107A CN116956927A CN 116956927 A CN116956927 A CN 116956927A CN 202310949107 A CN202310949107 A CN 202310949107A CN 116956927 A CN116956927 A CN 116956927A
Authority
CN
China
Prior art keywords
text
bankruptcy
sequence data
model
optimal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310949107.XA
Other languages
Chinese (zh)
Inventor
赵飞
闫丰
杜建业
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Odetta Data Technology Co ltd
Original Assignee
Beijing Odetta Data Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Odetta Data Technology Co ltd filed Critical Beijing Odetta Data Technology Co ltd
Priority to CN202310949107.XA priority Critical patent/CN116956927A/en
Publication of CN116956927A publication Critical patent/CN116956927A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)

Abstract

The application relates to the technical field of natural language processing, and discloses a method and a system for identifying a named entity of a bankruptcy document, wherein the method comprises the following steps: performing word coding on the bankruptcy document through the BERT language model obtained through pre-training, and extracting text features to generate word vectors; performing bidirectional coding on the generated word vector to obtain text tag sequence data; performing optimal decoding on the text label sequence data to obtain an optimal text label sequence; and determining the label category to which each character belongs according to the optimal text label sequence. According to the application, by adding the BERT pre-training language model as the feature expression layer, text semantic information is stored more completely, the context bidirectional feature extraction capacity of the model is improved, the semantic information is utilized more fully, the problem of boundary division of named entities is solved better, and the recognition rate of the model to the entities is improved.

Description

Method and system for identifying named entities of bankruptcy document
Technical Field
The application relates to the technical field of natural language processing, in particular to a method and a system for identifying a named entity of a bankruptcy document.
Background
Named entity recognition begins at earliest, and is primarily based on dictionary and rule based methods that rely on manually constructed rule templates by linguists, are prone to errors, and the method can only process some simple text data, but cannot process complex unstructured data. Therefore, the machine learning-based method is increasingly popular and mainly includes a hidden markov model (Hidden Markov Model, HMM), a maximum entropy model (Maxmium Entropy model, ME), a support vector machine model (Support Vector Machine, SVM), a conditional random field (Conditional Random Field, CRF), and the like.
In recent years, with the development of hardware capability and the appearance of distributed expression of words, a neural network becomes a model capable of effectively processing many natural language processing (Natural Language Processing, NLP) tasks, bengio et al first put forward a method for constructing a language model by using the neural network, and the influence of data sparseness on statistical modeling is skillfully solved by the distributed expression of words, and meanwhile, the dimension disaster problem of model parameters is overcome.
Because of the common phenomenon of Word ambiguity and Word pluripotency among Chinese characters, in order to improve the accuracy of Chinese entity recognition, many students also use Word2Vec and other Word embedding models to train and learn the distributed representation of Word vectors. However, word2Vec and other pre-training models mainly focus on features between words or characters, neglecting the context of words, resulting in limited recognition capacity of the Word, and still have the problem of being unable to characterize the Word as ambiguous.
Indeed, research on named entity recognition at home and abroad is mature, but entity recognition in the field of bankruptcy documents is different from that in the general field, and has the specificity of the entity in the field. In one aspect, bankruptcy document web page data is mostly semi-structured text information presented in bulletin form, and there is no fixed standard format specification, but the organization of content has similar features. On the other hand, the related text has uneven length, contains a large amount of company information and various short names, the related terms have strong specialty, the phenomenon of word ambiguity or multi-word synonym exists, and the traditional method is not ideal for the identification accuracy and coverage rate of the entities. Through research, the related research in the field of bankruptcy documents is less, so that the task of identifying named entities in the field still has great research value and room for improvement.
Disclosure of Invention
The application provides a method and a system for identifying a named entity of a bankruptcy document, which are used for solving the technical problems in the prior art.
According to a first aspect of the application, a method for identifying named entities of bankruptcy documents is provided.
The method for identifying the named entities of the bankruptcy document comprises the following steps:
performing word coding on the bankruptcy document through the BERT language model obtained through pre-training, and extracting text features to generate word vectors;
performing bidirectional coding on the generated word vector to obtain text tag sequence data; performing optimal decoding on the text label sequence data to obtain an optimal text label sequence;
and determining the label category to which each character belongs according to the optimal text label sequence.
In addition, the method for identifying the named entities of the bankruptcy documents further comprises the following steps: performing masking language training on the BERT model to obtain a BERT language model; and when the BERT model is subjected to masking language training, 15% of words in text sentences are randomly masked, and then the words in masking positions are predicted by adopting an unsupervised learning method.
The BERT model is in a transducer structure.
In addition, bi-directionally encoding the generated word vector to obtain text tag sequence data includes: and taking the generated word vector as an input vector, inputting the input vector into a two-way long-short-term memory network layer for two-way coding, and obtaining text tag sequence data.
In addition, performing optimal decoding on the text tag sequence data to obtain an optimal text tag sequence includes: and decoding the text label sequence data through a CRF neural network model to obtain an optimal text label sequence.
According to a second aspect of the present application, there is provided a bankruptcy document named entity recognition system.
The bankruptcy document named entity recognition system comprises:
the text feature extraction module is used for carrying out word coding on the bankruptcy document through the BERT language model obtained through pre-training, extracting text features and generating word vectors;
the text label determining module is used for carrying out bidirectional coding on the generated word vector to obtain text label sequence data; performing optimal decoding on the text label sequence data to obtain an optimal text label sequence;
and the text character recognition module is used for determining the label category to which each character belongs according to the optimal text label sequence.
In addition, the bankruptcy document named entity recognition system further comprises: the model training module is used for carrying out masking language training on the BERT model to obtain a BERT language model; and when the BERT model is subjected to masking language training, 15% of words in text sentences are randomly masked, and then the words in masking positions are predicted by adopting an unsupervised learning method.
The BERT model is in a transducer structure.
In addition, when the text label determining module carries out bidirectional coding on the generated word vector to obtain text label sequence data, the generated word vector is used as an input vector and is input into a bidirectional long-short-term memory network layer to carry out bidirectional coding to obtain the text label sequence data.
In addition, when the text label determining module optimally decodes the text label sequence data to obtain an optimal text label sequence, the text label sequence data is decoded through a CRF neural network model to obtain an optimal text label sequence.
The technical scheme provided by the embodiment of the application can have the following beneficial effects:
according to the application, by adding the BERT pre-training language model as the feature expression layer, text semantic information is stored more completely, the context bidirectional feature extraction capability of the model is improved, the semantic information is utilized more fully, the problem of boundary division of named entities is solved better, the recognition rate of the model to the entities is improved, and the overall named entity recognition accuracy of the model reaches 92.45%.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
FIG. 1 is a flow diagram illustrating a method of identifying named entities of a bankruptcy document according to an exemplary embodiment;
FIG. 2 is a block diagram illustrating a system for identifying named entities of bankruptcy documents in accordance with an exemplary embodiment;
FIG. 3 is a diagram of an overall model framework of BERT-BiLSTM+CRF, shown in accordance with an exemplary embodiment;
FIG. 4 is an input vector representation of a BERT shown in accordance with an exemplary embodiment;
FIG. 5 is a diagram of a transducer structure shown in accordance with an exemplary embodiment;
FIG. 6 is a block diagram of an LSTM cell shown in accordance with an exemplary embodiment;
fig. 7 is a schematic diagram of a computer device, according to an example embodiment.
Detailed Description
The following description and the drawings sufficiently illustrate specific embodiments herein to enable those skilled in the art to practice them. Portions and features of some embodiments may be included in, or substituted for, those of others. The scope of the embodiments herein includes the full scope of the claims, as well as all available equivalents of the claims. The terms "first," "second," and the like herein are used merely to distinguish one element from another element and do not require or imply any actual relationship or order between the elements. Indeed the first element could also be termed a second element and vice versa. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a structure, apparatus, or device that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such structure, apparatus, or device. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a structure, apparatus or device comprising the element. Various embodiments are described herein in a progressive manner, each embodiment focusing on differences from other embodiments, and identical and similar parts between the various embodiments are sufficient to be seen with each other.
The terms "longitudinal," "transverse," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like herein refer to an orientation or positional relationship based on that shown in the drawings, merely for ease of description herein and to simplify the description, and do not indicate or imply that the devices or elements referred to must have a particular orientation, be constructed and operate in a particular orientation, and thus are not to be construed as limiting the application. In the description herein, unless otherwise specified and limited, the terms "mounted," "connected," and "coupled" are to be construed broadly, and may be, for example, mechanically or electrically coupled, may be in communication with each other within two elements, may be directly coupled, or may be indirectly coupled through an intermediary, as would be apparent to one of ordinary skill in the art.
Herein, unless otherwise indicated, the term "plurality" means two or more.
Herein, the character "/" indicates that the front and rear objects are an or relationship. For example, A/B represents: a or B.
Herein, the term "and/or" is an association relation describing an object, meaning that three relations may exist. For example, a and/or B, represent: a or B, or, A and B.
It should be understood that, although the steps in the flowchart are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the figures may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor does the order in which the sub-steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of other steps or other steps.
The various modules in the apparatus or system of the present application may be implemented in whole or in part in software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
Embodiments of the application and features of the embodiments may be combined with each other without conflict.
FIG. 1 illustrates one embodiment of a method of the present application for identifying named entities of bankruptcy documents.
In this alternative embodiment, the method for identifying named entities of the bankruptcy document includes:
step S101, performing word coding on a bankruptcy document through a BERT language model obtained through pre-training, and extracting text features to generate a word vector;
step S103, performing bidirectional coding on the generated word vector to obtain text tag sequence data; performing optimal decoding on the text label sequence data to obtain an optimal text label sequence;
step S105, determining the label category to which each character belongs according to the optimal text label sequence.
FIG. 2 illustrates one embodiment of a bankruptcy document named entity recognition system of the present application.
In this alternative embodiment, the bankruptcy document named entity recognition system includes:
the text feature extraction module 201 is configured to perform word encoding on the bankruptcy document through the BERT language model obtained through pre-training, and extract text features to generate a word vector;
the text tag determining module 203 is configured to perform bidirectional encoding on the generated word vector to obtain text tag sequence data; performing optimal decoding on the text label sequence data to obtain an optimal text label sequence;
the text character recognition module 205 is configured to determine, according to the optimal text label sequence, a label class to which each character belongs.
In practical application, as shown in fig. 3-5, the BERT language model is obtained by the following training method, specifically:
the input part of Bert is a linear sequence, two sentences are split by a separator, and the forefront and last two marks are added. Each word has Position Embeddings (position embedding), token embedded and Segment Embeddings (separation embedding), and three embedding corresponding to the word are overlapped to form the Bert input;
the Bert pre-training task adopts a Masked Language Model (mask language model) pre-training method, randomly masks 15% of words in one sentence, and then adopts an unsupervised learning method to predict the words at the mask position so as to achieve the training of the bidirectional features;
the BERT model architecture is based on multi-layer bi-directional transform decoding, and adopts a transducer structure (gesture-based motion recognition model). A transducer is an encoder-decoder structure formed by a stack of several encoders and decoders. The model structure is as follows:
the left part is an encoder, which consists of Multi-Head Attention and a full connection and is used for converting input corpus into feature vectors;
the right part is the decoder whose inputs are the output of the encoder and the predicted result, consisting of Masked Multi-Head Attention, multi-Head Attention and a full concatenation for outputting the conditional probability of the final result.
When the generated word vector is bidirectionally encoded to obtain text label sequence data, the generated word vector is used as an input vector and is input into a bidirectional long-short-term memory network layer to be bidirectionally encoded to obtain the text label sequence data.
As shown in fig. 2 and 6, the specific steps are as follows:
firstly, determining to discard a certain part from the previous information by a forgetting gate, wherein the forgetting gate is used for selectively forgetting the information in the cell state; output h of hidden layer at t-1 moment t-1 And input x at the current time t t As input; outputting the final value f by activating the function sigmoid t The value range is [0,1 ]]Shows the cell state c at time t-1 t-1 0 indicates total discard and 1 indicates total retention. f (f) t The calculation formula of (2) is as follows: f (f) t =σ(W f ·[h t-1 ,x t ]+b f ) The method comprises the steps of carrying out a first treatment on the surface of the In which W is f And b f Representing the weights and biases connecting the two layers.
The input gate is used for storing information to be updated, and the input gate selectively records new information into the cell state; the sigmoid layer decides what values to update; the tanh layer creates a candidate vector Will be added to the cell state, i t 、/>c t The specific formula is as follows:
i t =σ(W i ·[h t-1 ,x t ]+b i )
wherein i is t Determining a value to be updated for the output of the input gate;the probability of forgetting the state of the cell of the upper layer is in the value range of [0,1 ]]0 means completely discarded, 1 means completely reserved; c t Indicating a temporary state, including a new candidate value. W (W) i A weight coefficient representing ht-1 in the input gate; w (W) c The weight coefficient of ht-1 in the extraction process is represented; b i Representing the bias value of the input gate; b c Representing the bias value in the feature extraction process; h represents a hidden layer; sigma represents the activation function Sigmoid.
Finally, it is decided what value to output through an output gate, which is the decision of the last output and how much of the cell state to control the layer needs to be filtered; the method comprises the following steps: firstly, calculating by adopting sigmoid function to obtain output factor o t The method comprises the steps of carrying out a first treatment on the surface of the Then, calculating the normalized value of the current cell state through the tanh function, and then outputting the normalized value and the output factor o t After multiplication, the hidden layer output h at the current moment is obtained t The calculation formula is as follows:
o t =σ(W o [h t-1 ,x t ]+b o )
h t =o t *tanh(c t )
in which W is o A weight coefficient representing ht-1 in the output gate; sigma activation function Sigmoid; b o Representing the offset value of the output gate; h represents a hidden layer.
In an actual natural language sentence, key information may appear at the beginning of the sentence or at the end of the sentence, and for named entity recognition, a comprehensive reverse LSTM (Long Short-Term Memory network) is required to learn, i.e., a BiLSTM (two-way Long-Term Memory network). One LSTM network calculates forward hidden features, the other LSTM network calculates backward hidden features, and the two output results are spliced to obtain a bidirectional LSTM network, so that the bidirectional semantic dependence with a longer distance can be better captured.
And when the text label sequence data is optimally decoded to obtain an optimal text label sequence, decoding the text label sequence data through a CRF neural network model to obtain the optimal text label sequence.
Specifically, the CRF neural network model can learn the front-back dependency of sentences, and add some constraint conditions to ensure that the final prediction result is valid, and the algorithm firstly obtains a prediction tag sequence y= (y 1, y2, …, yn) for each input sequence x= (x 1, x2, …, xn), and defines the scoring function of the sequence as follows:the former is determined by CRF transfer matrix A, < + >>Representing the transition score from the yi-th tag to the yi+1th tag, the latter determined by p output by LSTM,/o>Outputting a probability of yi for the ith position softmax because the starting and ending positions are added, the dimension of the transition probability matrix is n+2;
given a training sample sequence x, probability normalization is carried out on the score of the correct sequence y:in (1) the->Representing true tag values, Y represents all possible tag sets, and the numerator represents the correct tag sequence, denominatorRepresenting all possible annotation sequences. Then, a loss function is defined:searching proper parameters through algorithms such as gradient descent and the like to minimize a loss function, and predicting an optimal solution of the model after training is completed:
FIG. 7 illustrates one embodiment of a computer device of the present application. The computer device may be a server including a processor, memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used to store static information and dynamic information data. The network interface of the computer device is used for communicating with an external terminal through a network connection. Which computer program, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.
It will be appreciated by those skilled in the art that the structure shown in FIG. 7 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In an embodiment, a computer device is also provided, comprising a memory and a processor, the memory having stored therein a computer program, the processor performing the steps of the above-described method embodiments when the computer program is executed.
In one embodiment, a computer readable storage medium is provided, on which a computer program is stored which, when executed by a processor, carries out the steps of the method embodiments described above.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, or the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like.
The present application is not limited to the structure that has been described above and shown in the drawings, and various modifications and changes can be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (10)

1. The method for identifying the named entity of the bankruptcy document is characterized by comprising the following steps of:
performing word coding on the bankruptcy document through the BERT language model obtained through pre-training, and extracting text features to generate word vectors;
performing bidirectional coding on the generated word vector to obtain text tag sequence data; performing optimal decoding on the text label sequence data to obtain an optimal text label sequence;
and determining the label category to which each character belongs according to the optimal text label sequence.
2. The method for identifying named entities of bankruptcy documents according to claim 1, further comprising:
performing masking language training on the BERT model to obtain a BERT language model;
and when the BERT model is subjected to masking language training, 15% of words in text sentences are randomly masked, and then the words in masking positions are predicted by adopting an unsupervised learning method.
3. The method for identifying the named entities of the bankruptcy document according to claim 2, wherein the structure of the BERT model is a transducer structure.
4. The method for identifying named entities of bankruptcy documents according to claim 1, wherein bi-directionally encoding the generated word vectors to obtain text tag sequence data comprises:
and taking the generated word vector as an input vector, inputting the input vector into a two-way long-short-term memory network layer for two-way coding, and obtaining text tag sequence data.
5. The method for identifying a named entity of a bankruptcy document according to claim 4, wherein optimally decoding the text tag sequence data to obtain an optimal text tag sequence comprises:
and decoding the text label sequence data through a CRF neural network model to obtain an optimal text label sequence.
6. A system for identifying named entities of bankruptcy documents, comprising:
the text feature extraction module is used for carrying out word coding on the bankruptcy document through the BERT language model obtained through pre-training, extracting text features and generating word vectors;
the text label determining module is used for carrying out bidirectional coding on the generated word vector to obtain text label sequence data; performing optimal decoding on the text label sequence data to obtain an optimal text label sequence;
and the text character recognition module is used for determining the label category to which each character belongs according to the optimal text label sequence.
7. The bankruptcy document named entity recognition system of claim 6, further comprising: the model training module is used for carrying out masking language training on the BERT model to obtain a BERT language model; and when the BERT model is subjected to masking language training, 15% of words in text sentences are randomly masked, and then the words in masking positions are predicted by adopting an unsupervised learning method.
8. The system of claim 7, wherein the BERT model is constructed as a Transformer.
9. The system of claim 6, wherein the text tag determination module, when performing bi-directional encoding on the generated word vector to obtain text tag sequence data, takes the generated word vector as an input vector, and inputs the input vector to a bi-directional long-short-term memory network layer to perform bi-directional encoding to obtain the text tag sequence data.
10. The system of claim 9, wherein the text tag determination module decodes the text tag sequence data via a CRF neural network model to obtain an optimal text tag sequence when optimally decoding the text tag sequence data to obtain an optimal text tag sequence.
CN202310949107.XA 2023-07-31 2023-07-31 Method and system for identifying named entities of bankruptcy document Pending CN116956927A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310949107.XA CN116956927A (en) 2023-07-31 2023-07-31 Method and system for identifying named entities of bankruptcy document

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310949107.XA CN116956927A (en) 2023-07-31 2023-07-31 Method and system for identifying named entities of bankruptcy document

Publications (1)

Publication Number Publication Date
CN116956927A true CN116956927A (en) 2023-10-27

Family

ID=88461634

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310949107.XA Pending CN116956927A (en) 2023-07-31 2023-07-31 Method and system for identifying named entities of bankruptcy document

Country Status (1)

Country Link
CN (1) CN116956927A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110516256A (en) * 2019-08-30 2019-11-29 的卢技术有限公司 A kind of Chinese name entity extraction method and its system
CN114048745A (en) * 2021-11-05 2022-02-15 新智道枢(上海)科技有限公司 Method and system for recognizing named entities of digital police service warning situation addresses
CN114564959A (en) * 2022-01-14 2022-05-31 北京交通大学 Method and system for identifying fine-grained named entities of Chinese clinical phenotype
CN115238697A (en) * 2022-07-26 2022-10-25 贵州数联铭品科技有限公司 Judicial named entity recognition method based on natural language processing

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110516256A (en) * 2019-08-30 2019-11-29 的卢技术有限公司 A kind of Chinese name entity extraction method and its system
CN114048745A (en) * 2021-11-05 2022-02-15 新智道枢(上海)科技有限公司 Method and system for recognizing named entities of digital police service warning situation addresses
CN114564959A (en) * 2022-01-14 2022-05-31 北京交通大学 Method and system for identifying fine-grained named entities of Chinese clinical phenotype
CN115238697A (en) * 2022-07-26 2022-10-25 贵州数联铭品科技有限公司 Judicial named entity recognition method based on natural language processing

Similar Documents

Publication Publication Date Title
CN111783462B (en) Chinese named entity recognition model and method based on double neural network fusion
CN110135457B (en) Event trigger word extraction method and system based on self-encoder fusion document information
CN111480197B (en) Speech recognition system
CN112712804B (en) Speech recognition method, system, medium, computer device, terminal and application
CN111310471B (en) Travel named entity identification method based on BBLC model
US20190189111A1 (en) Method and Apparatus for Multi-Lingual End-to-End Speech Recognition
CN110866401A (en) Chinese electronic medical record named entity identification method and system based on attention mechanism
CN110263325B (en) Chinese word segmentation system
CN111062217B (en) Language information processing method and device, storage medium and electronic equipment
Hori et al. Dialog state tracking with attention-based sequence-to-sequence learning
CN111079432B (en) Text detection method and device, electronic equipment and storage medium
CN113177412A (en) Named entity identification method and system based on bert, electronic equipment and storage medium
CN116127953B (en) Chinese spelling error correction method, device and medium based on contrast learning
CN115859164A (en) Method and system for identifying and classifying building entities based on prompt
CN114036950A (en) Medical text named entity recognition method and system
CN115688784A (en) Chinese named entity recognition method fusing character and word characteristics
Ye et al. Chinese named entity recognition based on character-word vector fusion
CN115658898A (en) Chinese and English book entity relation extraction method, system and equipment
CN113191150B (en) Multi-feature fusion Chinese medical text named entity identification method
CN115809666B (en) Named entity recognition method integrating dictionary information and attention mechanism
CN116362242A (en) Small sample slot value extraction method, device, equipment and storage medium
CN115600597A (en) Named entity identification method, device and system based on attention mechanism and intra-word semantic fusion and storage medium
CN115240712A (en) Multi-mode-based emotion classification method, device, equipment and storage medium
CN116956927A (en) Method and system for identifying named entities of bankruptcy document
CN113012685B (en) Audio recognition method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination