CN112949309A - Enterprise association relation extraction method and device, storage medium and electronic device - Google Patents

Enterprise association relation extraction method and device, storage medium and electronic device Download PDF

Info

Publication number
CN112949309A
CN112949309A CN202110218014.0A CN202110218014A CN112949309A CN 112949309 A CN112949309 A CN 112949309A CN 202110218014 A CN202110218014 A CN 202110218014A CN 112949309 A CN112949309 A CN 112949309A
Authority
CN
China
Prior art keywords
enterprise
financial
name
company
entity company
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110218014.0A
Other languages
Chinese (zh)
Inventor
马小龙
祝世虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Everbright Bank Co Ltd
Original Assignee
China Everbright Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Everbright Bank Co Ltd filed Critical China Everbright Bank Co Ltd
Priority to CN202110218014.0A priority Critical patent/CN112949309A/en
Publication of CN112949309A publication Critical patent/CN112949309A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the invention provides a method, a device, a storage medium and an electronic device for extracting an enterprise association relationship, wherein the method comprises the following steps: receiving enterprise text data, and acquiring a name of a host entity company and an incidence relation between the host entity company and a guest entity company according to the enterprise text data; obtaining the name of the guest entity company according to the financial word bank; and extracting the enterprise incidence relation according to the obtained main entity company name, the guest entity company name and the incidence relation to obtain an incidence relation triple. According to the method and the device, the enterprise incidence relation in the text is analyzed efficiently due to the fact that the enterprise text data are quantized, the problem that the analysis efficiency of manually screening the enterprise incidence relation in the related technology is low is solved, and the effects of improving the text data processing efficiency and rapidly extracting the enterprise incidence relation are achieved.

Description

Enterprise association relation extraction method and device, storage medium and electronic device
Technical Field
The embodiment of the invention relates to the technical field of text processing, in particular to a method and a device for extracting an enterprise association relation, a storage medium and an electronic device.
Background
The internet is full of various enterprise public opinion information. At present, the entity relationship classification method is mainly based on entity relationship classification obtained by analysis of plain text. In the field of enterprise text association relation analysis, public sentiment information has the problems of uncontrollable source, description diversity and the like, so that the recognition effect is poor, and the relation cannot be used for truly generating an environment. How to find out meaningful enterprise association relations from massive enterprise texts is particularly urgent and is a research focus at present.
When a text file containing enterprise association relation is analyzed, the related method is to manually screen and analyze the text file, but because the text information data volume is huge, and time and labor are consumed manually one by one, the analysis efficiency is low, and the speed is very low.
Aiming at the problem of low analysis efficiency of manually screening enterprise incidence relations in the related technology, no effective solution is provided at present.
Disclosure of Invention
The embodiment of the invention provides a method and a device for extracting an enterprise incidence relation, a storage medium and an electronic device, which are used for at least solving the problem of low analysis efficiency of manually screening the enterprise incidence relation in the related technology.
According to an embodiment of the present invention, an enterprise association relation extraction method is provided, including: receiving enterprise text data, and acquiring a name of a main entity company and an incidence relation between the main entity company and a guest entity company according to the enterprise text data; acquiring the name of the guest entity company according to a financial word bank; and extracting the enterprise incidence relation according to the obtained main entity company name, the guest entity company name and the incidence relation to obtain an incidence relation triple.
In an exemplary embodiment, receiving the enterprise text data, and obtaining the name of the host entity company and the association relationship between the host entity company and the guest entity company according to the enterprise text data may include: receiving enterprise text data, and carrying out named entity identification on the enterprise text data to obtain a main entity company name; and performing part-of-speech recognition on the enterprise text data to obtain an incidence relation between the main entity company and the guest entity company.
In an exemplary embodiment, obtaining the guest entity company name from the financial thesaurus may include: performing transfer learning on the financial word stock to obtain word vectors of the financial word stock; and carrying out named entity recognition on the word vectors of the financial word stock to obtain the name of the guest entity company.
In an exemplary embodiment, the method may further include: and acquiring the name of the main entity company according to the financial word stock.
In an exemplary embodiment, before obtaining the name of the guest entity company according to the financial thesaurus, the method may further include: obtaining the financial thesaurus according to at least one of the following: enterprise financial newspaper texts, financial encyclopedias, financial news corpora, financial economic books and economic microblog forums.
According to another embodiment of the present invention, an enterprise association relation extracting apparatus is provided, including: the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for receiving enterprise text data and acquiring the name of a main entity company and the incidence relation between the main entity company and a guest entity company according to the enterprise text data; the second acquisition module is used for acquiring the name of the guest entity company according to the financial word stock; and the extraction module is used for extracting the enterprise incidence relation according to the obtained main entity company name, the guest entity company name and the incidence relation to obtain an incidence relation triple.
In an exemplary embodiment, the first obtaining module may further include: the system comprises a first identification unit, a second identification unit and a third identification unit, wherein the first identification unit is used for receiving enterprise text data and carrying out named entity identification on the enterprise text data to obtain a main entity company name; and the second identification unit is used for identifying the part of speech of the enterprise text data to obtain the incidence relation between the main entity company and the guest entity company.
In an exemplary embodiment, the second obtaining module may further include: the migration learning unit is used for performing migration learning on the financial word stock to obtain word vectors of the financial word stock; and the third identification unit is used for carrying out named entity identification on the word vectors of the financial word stock to obtain the name of the guest entity company.
In one exemplary embodiment, the apparatus may further include: an obtaining module, configured to obtain a financial thesaurus according to at least one of the following before obtaining the guest entity company name according to the financial thesaurus: enterprise financial newspaper texts, financial encyclopedias, financial news corpora, financial economic books and economic microblog forums.
According to a further embodiment of the present invention, there is also provided a computer-readable storage medium having a computer program stored thereon, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.
According to yet another embodiment of the present invention, there is also provided an electronic device, including a memory in which a computer program is stored and a processor configured to execute the computer program to perform the steps in any of the above method embodiments.
According to the method and the device, the enterprise text data are quantized, and the enterprise incidence relation in the text is efficiently analyzed, so that the problem of low analysis efficiency of manually screening the enterprise incidence relation in the related technology can be solved, and the effects of improving the text data processing efficiency and rapidly extracting the enterprise incidence relation are achieved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
fig. 1 is a block diagram of a hardware structure of a computer terminal of an enterprise association relation extraction method according to an embodiment of the present invention;
FIG. 2 is a flow chart of a method for enterprise association extraction according to an embodiment of the invention;
fig. 3 is a block diagram of an enterprise association relation extraction apparatus according to an embodiment of the present invention;
fig. 4 is a block diagram of an enterprise association relation extraction apparatus according to an alternative embodiment of the present invention;
FIG. 5 is a flowchart of a method for extracting business association based on transfer learning and named entity identification according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of a transfer learning algorithm in an enterprise association relation extraction method according to an embodiment of the present invention;
fig. 7 is a schematic diagram of a named entity identification algorithm in an enterprise association relation extraction method according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In order to better understand the technical solutions of the embodiments and the alternative embodiments of the present invention, the following description is made on possible application scenarios in the embodiments and the alternative embodiments of the present invention, but is not limited to the application of the following scenarios.
The method embodiments provided in the embodiments of the present application may be executed in a mobile terminal, a computer terminal, or a similar computing device. Taking an example of the operation on a computer terminal, fig. 1 is a hardware structure block diagram of a computer terminal of an enterprise association relation extraction method according to an embodiment of the present invention. As shown in fig. 1, the computer terminal may include one or more (only one shown in fig. 1) processors 102 (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA) and a memory 104 for storing data, wherein the computer terminal may further include a transmission device 106 for communication functions and an input-output device 108. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the computer terminal. For example, the computer terminal may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.
The memory 104 may be used to store computer programs, for example, software programs and modules of application software, such as a computer program corresponding to the enterprise association relation extraction method in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the computer programs stored in the memory 104, so as to implement the method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to a computer terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal. In one example, the transmission device 106 includes a Network adapter (NIC), which can be connected to other Network devices through a base station so as to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.
In this embodiment, an enterprise association relation extracting method running on the computer terminal is provided, and fig. 2 is a flowchart of the enterprise association relation extracting method according to the embodiment of the present invention, as shown in fig. 2, the flowchart includes the following steps:
step S201, receiving enterprise text data, and acquiring a name of a host entity company and an incidence relation between the host entity company and a guest entity company according to the enterprise text data.
Step S202, the name of the guest entity company is obtained according to the financial word stock.
And step S203, performing enterprise incidence relation extraction according to the acquired main entity company name, the acquired guest entity company name and the incidence relation to obtain an incidence relation triple.
In this embodiment, step S201 may include: receiving enterprise text data, and performing Named Entity Recognition (NER) on the enterprise text data to obtain a main Entity company name; and performing part-of-speech recognition on the enterprise text data to obtain an incidence relation between the main entity company and the guest entity company.
In this embodiment, specifically, named entity recognition is an important basic tool in application fields such as information extraction, question-answering system, syntactic analysis, and machine translation, and plays an important role in the process of putting natural language processing technology into practical use. Generally speaking, the task of named entity recognition is to identify named entities in three major categories (entity category, time category and number category), seven minor categories (person name, organization name, place name, time, date, currency and percentage) in the text to be processed.
In this embodiment, step S202 may include: performing transfer learning on the financial word stock to obtain word vectors of the financial word stock; and carrying out named entity recognition on the word vectors of the financial word stock to obtain the name of the guest entity company.
In this embodiment, the method may further include: and acquiring the name of the main entity company according to the financial word stock.
Before step S202 in this embodiment, the method may further include: obtaining the financial thesaurus according to at least one of the following: enterprise financial newspaper texts, financial encyclopedias, financial news corpora, financial economic books and economic microblog forums.
Through the steps, the enterprise text data are quantized, and the enterprise incidence relation in the text is efficiently analyzed, so that the problem of low analysis efficiency of manually screening the enterprise incidence relation in the related technology can be solved, and the effects of improving the text data processing efficiency and rapidly extracting the enterprise incidence relation are achieved.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
In this embodiment, an enterprise association relation extracting apparatus is further provided, and the apparatus is used to implement the foregoing embodiments and preferred embodiments, and the description already made is omitted. As used below, the terms "module" and "unit" may implement a combination of software and/or hardware of predetermined functions. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 3 is a block diagram of an enterprise association relation extraction apparatus according to an embodiment of the present invention, and as shown in fig. 3, the apparatus includes a first obtaining module 10, a second obtaining module 20, and an extraction module 30.
The first obtaining module 10 is configured to receive enterprise text data, and obtain a name of a host entity company and an association relationship between the host entity company and a guest entity company according to the enterprise text data.
The second obtaining module 20 is configured to obtain the name of the guest entity company according to the financial thesaurus.
The extraction module 30 is configured to perform enterprise association extraction according to the obtained host entity company name, the obtained guest entity company name, and the association relationship, so as to obtain an association relationship triple.
Fig. 4 is a block diagram of an enterprise association relation extracting apparatus according to an alternative embodiment of the present invention, and as shown in fig. 4, the apparatus may further include an obtaining module 40, the first obtaining module 10 may further include a first identifying unit 11 and a second identifying unit 12, the second obtaining module 20 may further include a migration learning unit 21 and a third identifying unit 22, in addition to all modules shown in fig. 3.
The first identification unit 11 is configured to receive enterprise text data, and perform named entity identification on the enterprise text data to obtain a main entity company name.
The second identifying unit 12 is configured to perform part-of-speech identification on the enterprise text data to obtain an association relationship between the host entity company and the guest entity company.
The migration learning unit 21 is configured to perform migration learning on the financial thesaurus to obtain a word vector of the financial thesaurus.
The third identifying unit 22 is configured to perform named entity identification on the word vectors of the financial thesaurus to obtain the name of the guest entity company.
The obtaining module 40 is configured to, before obtaining the name of the guest entity company according to a financial thesaurus, obtain the financial thesaurus according to at least one of the following: enterprise financial newspaper texts, financial encyclopedias, financial news corpora, financial economic books and economic microblog forums.
In this embodiment, the enterprise association relationship extracting apparatus may be further configured to obtain the name of the main entity company according to the financial thesaurus.
It should be noted that, the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are respectively located in different processors in any combination.
In order to facilitate understanding of the technical solutions provided by the present invention, the following detailed description will be made with reference to embodiments of specific scenarios.
In this embodiment, a migration learning method is used, a word vector obtained by pre-training based on a large amount of financial corpus is used to optimize an existing named entity recognition technology (including but not limited to LSTM + CRF, etc.), which is used to extract a company name, and a part-of-speech analysis algorithm is used to extract an association relationship, so as to obtain a triple (S, P, O) (S is a subject representing a main entity, O is an object representing a guest entity, and P is a prefix representing a relationship between two entities) of the association relationship (S, P, O), thereby achieving the purpose of rapidly extracting and analyzing an association relationship of an enterprise.
In this embodiment, in particular, the use of LSTM + CRF for named entity recognition is currently the most common scheme.
In named entity recognition, if sequence tagging is performed with LSTM (LONG-SHORT TERM MEMORY network) alone, it is well understood, which is equivalent to classifying each token in a text sequence to obtain its tag label.
However, the light labeling with LSTM has a problem that the arrangement layout of the general predicted result does not conform to the habit of normal people, and generally the arrangement layout generated by LSTM is relatively rare in normal language expression, i.e. the probability is relatively low. Therefore, the presence of a layer of CRF (conditional random field) is intended to allow the model to learn some constraints similar to those used by normal persons.
In this embodiment, named entity recognition is a typical task of natural language processing, and aims to mark which word (or words) is/are an entity name and its entity type (such as a person name, a place name, an organization name, etc.).
Fig. 5 is a flowchart of a method for extracting an enterprise association relationship based on migration learning and named entity identification according to an embodiment of the present invention, as shown in fig. 5, the method includes the following steps:
step S501, input enterprise text data is preprocessed, and a main entity company can be generally obtained. Table 1 is a table of prepaid data in enterprise text after data preprocessing according to an embodiment of the present invention.
Figure BDA0002954670580000051
Figure BDA0002954670580000061
Figure BDA0002954670580000071
TABLE 1
Step S502, based on the mass financial text corpora, a word vector of a financial word stock (mainly a company name word stock) is obtained by using a transfer learning algorithm (including but not limited to a text feature extraction algorithm, a pre-training model algorithm and the like).
In this embodiment, in particular, current research on text feature extraction algorithms mainly focuses on selection of a text representation model and selection of a feature word selection algorithm. The basic units used to represent text are often referred to as features or feature items of the text. The characteristic items have certain characteristics, 1) the characteristic items can really identify the text content; 2) the feature item has the ability to distinguish the target text from other text; 3) the number of the characteristic items cannot be too many; 4) the characteristic item separation is easy to realize. In addition, generally, there are 4 ways to select features: (I) the original features are transformed into fewer new features by a mapping or transformation method; (2) selecting some most representative features from the original features; (3) selecting the most influential features according to the knowledge of experts; (4) the method is accurate, interference of human factors is less, and the method is particularly suitable for application of an automatic text classification mining system.
In this embodiment, specifically, the working principle of the pre-training model algorithm is mostly to give some word embedding and optional encoder (e.g. LSTM), the sentence embedding obtains context word embedding and defines some pooling, and then based on this selection, the pooling method is directly used to perform the supervised classification task or generate the target sequence, so as to obtain sentence embedding.
In this embodiment, fig. 6 is a schematic diagram of a transfer learning algorithm in the enterprise association relation extraction method according to the embodiment of the present invention, and as shown in fig. 6, the transfer learning algorithm sequentially inputs a financial sentence or a financial lexicon including the financial sentence into an input layer, a hidden layer, and an output layer of a neural network language model to obtain a pre-training word vector. The pre-training word vector is the word vector of the financial lexicon. Of course, the embodiment is not limited to the specific migration learning algorithm shown in fig. 6, and other migration learning algorithms capable of obtaining word vectors of the financial thesaurus are also possible and contemplated.
In step S503, based on the input word vector, the name of the guest entity company (or the name of the host entity company may be obtained in an auxiliary manner) is obtained by using a named entity recognition algorithm (including but not limited to LSTM/IDCNN + CRF).
In this embodiment, specifically, the neural network is a computer that simulates the way of human brain processing information, and organizes neurons in the form of neural network, and through training the network, an artificial intelligence organization with specific capability is obtained. While convolution and deconvolution are one operation of neural networks on images. For a picture, the convolution can extract its features, and the deconvolution can regenerate a picture based on these features.
In this embodiment, specifically, for sequence labeling, a disadvantage of the conventional CNN (Convolutional Neural Networks) is that after convolution, the end layer neurons may only obtain a small piece of information in the original input data. For NER, each word in the whole input sentence may affect the labeling of the current position, i.e. the problem of long distance dependence. More convolutional layers need to be added to cover the entire input information, resulting in deeper layers and more parameters. To prevent over-fitting and to add more regularization such as Dropout, more hyper-parameters are introduced, and the entire model becomes bulky and difficult to train.
The scaled CNN in the IDCNN (iterative scaled CNN, iterative extended convolutional neural network) adds an extension width to the filter, and when the filter is applied to an input matrix, all input data in the middle of the extension width can be skipped; the size of the filter itself remains unchanged, so that the filter acquires data on a wider input matrix, and looks like "bloating".
In particular applications, the expanded width increases exponentially with the number of layers. Thus, the number of parameters is increased linearly with the increase of the layer number, while the receptive field is increased exponentially, so that the whole input data can be covered quickly.
In this embodiment, in particular, attaching CRF layer to the end of network model such as LSTM or IDCNN is a very common method for sequence labeling. LSTM or IDCNN calculates the probability of each label for each word, while CRF layer introduces transition probability of sequence, and finally calculates loss feedback to network.
In this embodiment, fig. 7 is a schematic diagram of a named entity recognition algorithm in the enterprise association relation extraction method according to the embodiment of the present invention, and as shown in fig. 7, the named entity recognition algorithm inputs each word vector into a node of an LSTM layer, and then inputs the node into a CRF layer, so as to obtain a specific state value corresponding to each word vector. The status value is the name of the entity company (which may be the name of the guest entity company or the name of the host entity company). Of course, the embodiment is not limited to the specific named entity recognition algorithm shown in fig. 7, and other named entity recognition algorithms capable of obtaining the company name of the entity are possible and contemplated.
And step S504, obtaining the association relation of the preprocessed enterprise text data through a part-of-speech recognition algorithm. Table 2 is an association relationship table in the enterprise text after part of speech recognition according to the embodiment of the present invention.
Figure BDA0002954670580000081
TABLE 2
And step S505, forming an incidence relation triple by using the obtained main entity company, incidence relation and guest entity company, and completing the task of extracting the incidence relation of the enterprise. Table 3 is an enterprise association analysis result table obtained according to the extracted host entity company, association relationship and guest entity company after the named entity is identified in step S503 according to the embodiment of the present invention, which is a prepaid relationship table.
Serial number Listed Co Ltd Association relation Opponent hand
1 Company D Prepayment Company B
2 J Corp Ltd Prepayment Company R
3 J Corp Ltd Prepayment C Corp Ltd
4 Z Corp. Money receivable Company T
5 Company H Money receivable U Corp Ltd
6 Company N Money receivable Company M
7 Company G Money receivable Company E
8 Company F Money receivable Company I
9 Company F Money receivable Company O
10 Company X Money receivable Company S
11 Company X Money receivable Company A
12 Company Y Money receivable Company V
TABLE 3
In summary, in the embodiment, the enterprise names and the association relations are automatically extracted by using the migration learning and the named entity recognition, and the association relation triples are formed, so that the text data processing efficiency and accuracy can be improved. The embodiment can quantize the text information of the supervision file and efficiently analyze the enterprise association relationship in the text.
Embodiments of the present invention also provide a computer-readable storage medium having a computer program stored thereon, wherein the computer program is arranged to perform the steps of any of the above-mentioned method embodiments when executed.
In an exemplary embodiment, the storage medium may be configured to store a computer program for performing the steps of:
s1, receiving enterprise text data, and acquiring the name of a main entity company and the incidence relation between the main entity company and a guest entity company according to the enterprise text data;
s2, acquiring the name of the guest entity company according to the financial word stock;
and S3, extracting the enterprise incidence relation according to the obtained main entity company name, the guest entity company name and the incidence relation to obtain an incidence relation triple.
In an exemplary embodiment, the computer-readable storage medium may include, but is not limited to: various media capable of storing computer programs, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.
Embodiments of the present invention also provide an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.
In an exemplary embodiment, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
In an exemplary embodiment, the processor may be configured to execute the following steps by a computer program:
s1, receiving enterprise text data, and acquiring the name of a main entity company and the incidence relation between the main entity company and a guest entity company according to the enterprise text data;
s2, acquiring the name of the guest entity company according to the financial word stock;
and S3, extracting the enterprise incidence relation according to the obtained main entity company name, the guest entity company name and the incidence relation to obtain an incidence relation triple.
For specific examples in this embodiment, reference may be made to the examples described in the above embodiments and exemplary embodiments, and details of this embodiment are not repeated herein.
It will be apparent to those skilled in the art that the various modules or steps of the invention described above may be implemented using a general purpose computing device, they may be centralized on a single computing device or distributed across a network of computing devices, and they may be implemented using program code executable by the computing devices, such that they may be stored in a memory device and executed by the computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into various integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the principle of the present invention should be included in the protection scope of the present invention.

Claims (11)

1. An enterprise association relation extraction method is characterized by comprising the following steps:
receiving enterprise text data, and acquiring a name of a main entity company and an incidence relation between the main entity company and a guest entity company according to the enterprise text data;
acquiring the name of the guest entity company according to a financial word bank;
and extracting the enterprise incidence relation according to the obtained main entity company name, the guest entity company name and the incidence relation to obtain an incidence relation triple.
2. The method of claim 1, wherein receiving enterprise text data, and obtaining a host entity company name and an association between the host entity company and a guest entity company from the enterprise text data comprises:
receiving enterprise text data, and carrying out named entity identification on the enterprise text data to obtain a main entity company name;
and performing part-of-speech recognition on the enterprise text data to obtain an incidence relation between the main entity company and the guest entity company.
3. The method of claim 1, wherein obtaining the guest entity company name from a financial thesaurus comprises:
performing transfer learning on the financial word stock to obtain word vectors of the financial word stock;
and carrying out named entity recognition on the word vectors of the financial word stock to obtain the name of the guest entity company.
4. The method of claim 1 or 3, further comprising:
and acquiring the name of the main entity company according to the financial word stock.
5. The method of claim 1, further comprising, prior to obtaining the guest entity company name from a financial thesaurus:
obtaining the financial thesaurus according to at least one of the following: enterprise financial newspaper texts, financial encyclopedias, financial news corpora, financial economic books and economic microblog forums.
6. An enterprise association relation extraction device, comprising:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for receiving enterprise text data and acquiring the name of a main entity company and the incidence relation between the main entity company and a guest entity company according to the enterprise text data;
the second acquisition module is used for acquiring the name of the guest entity company according to the financial word stock;
and the extraction module is used for extracting the enterprise incidence relation according to the obtained main entity company name, the guest entity company name and the incidence relation to obtain an incidence relation triple.
7. The apparatus of claim 6, wherein the first obtaining module further comprises:
the system comprises a first identification unit, a second identification unit and a third identification unit, wherein the first identification unit is used for receiving enterprise text data and carrying out named entity identification on the enterprise text data to obtain a main entity company name;
and the second identification unit is used for identifying the part of speech of the enterprise text data to obtain the incidence relation between the main entity company and the guest entity company.
8. The apparatus of claim 6, wherein the second obtaining module further comprises:
the migration learning unit is used for performing migration learning on the financial word stock to obtain word vectors of the financial word stock;
and the third identification unit is used for carrying out named entity identification on the word vectors of the financial word stock to obtain the name of the guest entity company.
9. The apparatus of claim 6, further comprising:
an obtaining module, configured to obtain a financial thesaurus according to at least one of the following before obtaining the guest entity company name according to the financial thesaurus: enterprise financial newspaper texts, financial encyclopedias, financial news corpora, financial economic books and economic microblog forums.
10. A computer-readable storage medium, in which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 5.
11. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method as claimed in any of claims 1 to 5 are implemented when the computer program is executed by the processor.
CN202110218014.0A 2021-02-26 2021-02-26 Enterprise association relation extraction method and device, storage medium and electronic device Pending CN112949309A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110218014.0A CN112949309A (en) 2021-02-26 2021-02-26 Enterprise association relation extraction method and device, storage medium and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110218014.0A CN112949309A (en) 2021-02-26 2021-02-26 Enterprise association relation extraction method and device, storage medium and electronic device

Publications (1)

Publication Number Publication Date
CN112949309A true CN112949309A (en) 2021-06-11

Family

ID=76246533

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110218014.0A Pending CN112949309A (en) 2021-02-26 2021-02-26 Enterprise association relation extraction method and device, storage medium and electronic device

Country Status (1)

Country Link
CN (1) CN112949309A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180082183A1 (en) * 2011-02-22 2018-03-22 Thomson Reuters Global Resources Machine learning-based relationship association and related discovery and search engines
CN108763507A (en) * 2018-05-30 2018-11-06 北京百度网讯科技有限公司 Enterprise's incidence relation method for digging and device
CN110489599A (en) * 2019-07-08 2019-11-22 深圳壹账通智能科技有限公司 Business connection map construction method, apparatus, computer equipment and storage medium
CN111951052A (en) * 2020-08-14 2020-11-17 中国工商银行股份有限公司 Method and device for acquiring potential customers based on knowledge graph
CN112395407A (en) * 2020-11-03 2021-02-23 杭州未名信科科技有限公司 Method and device for extracting enterprise entity relationship and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180082183A1 (en) * 2011-02-22 2018-03-22 Thomson Reuters Global Resources Machine learning-based relationship association and related discovery and search engines
CN108763507A (en) * 2018-05-30 2018-11-06 北京百度网讯科技有限公司 Enterprise's incidence relation method for digging and device
CN110489599A (en) * 2019-07-08 2019-11-22 深圳壹账通智能科技有限公司 Business connection map construction method, apparatus, computer equipment and storage medium
CN111951052A (en) * 2020-08-14 2020-11-17 中国工商银行股份有限公司 Method and device for acquiring potential customers based on knowledge graph
CN112395407A (en) * 2020-11-03 2021-02-23 杭州未名信科科技有限公司 Method and device for extracting enterprise entity relationship and storage medium

Similar Documents

Publication Publication Date Title
CN106649818B (en) Application search intention identification method and device, application search method and server
CN110147551B (en) Multi-category entity recognition model training, entity recognition method, server and terminal
CN107346336B (en) Information processing method and device based on artificial intelligence
CN111222305B (en) Information structuring method and device
CN111767403B (en) Text classification method and device
CN112270196B (en) Entity relationship identification method and device and electronic equipment
CN110968684B (en) Information processing method, device, equipment and storage medium
CN110532451A (en) Search method and device for policy text, storage medium, electronic device
CN109543034B (en) Text clustering method and device based on knowledge graph and readable storage medium
CN111767716B (en) Method and device for determining enterprise multi-level industry information and computer equipment
CN110442841A (en) Identify method and device, the computer equipment, storage medium of resume
CN109086265B (en) Semantic training method and multi-semantic word disambiguation method in short text
CN108549723B (en) Text concept classification method and device and server
CN109598517B (en) Commodity clearance processing, object processing and category prediction method and device thereof
CN109918642A (en) The sentiment analysis method and system of Active Learning frame based on committee's inquiry
CN113722438B (en) Sentence vector generation method and device based on sentence vector model and computer equipment
CN110008473B (en) Medical text named entity identification and labeling method based on iteration method
CN110287341A (en) A kind of data processing method, device and readable storage medium storing program for executing
CN111914159A (en) Information recommendation method and terminal
CN113297351A (en) Text data labeling method and device, electronic equipment and storage medium
CN111462752A (en) Client intention identification method based on attention mechanism, feature embedding and BI-L STM
CN111782793A (en) Intelligent customer service processing method, system and equipment
CN113434688A (en) Data processing method and device for public opinion classification model training
CN110968664A (en) Document retrieval method, device, equipment and medium
CN113095723A (en) Coupon recommendation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination