CN111783460A

CN111783460A - Enterprise abbreviation extraction method and device, computer equipment and storage medium

Info

Publication number: CN111783460A
Application number: CN202010542872.6A
Authority: CN
Inventors: 孙华蔚; 沈春泽; 李加庆; 周张泉
Original assignee: Suning Financial Technology Nanjing Co Ltd
Current assignee: Suning Financial Technology Nanjing Co Ltd
Priority date: 2020-06-15
Filing date: 2020-06-15
Publication date: 2020-10-16

Abstract

The invention discloses an enterprise abbreviation extracting method, an enterprise abbreviation extracting device, computer equipment and a storage medium, belonging to the technical field of text information processing, wherein the method comprises the following steps: acquiring an enterprise name; matching the enterprise names according to a plurality of preset entity dictionaries and a plurality of regular expressions according to the matching priority order of each regular expression, wherein each entity dictionary does not contain an enterprise name entity; acquiring an entity sequence matched with the enterprise name, and identifying a name entity of the enterprise name according to the entity type of each entity in the entity sequence; and checking the name entity, and if the checking is successful, determining the name entity as the name of the enterprise for short. The invention can effectively improve the efficiency and the accuracy of enterprise abbreviation extraction.

Description

Enterprise abbreviation extraction method and device, computer equipment and storage medium

Technical Field

The invention relates to the technical field of text information processing, in particular to an enterprise abbreviation extraction method, an enterprise abbreviation extraction device, computer equipment and a storage medium.

Background

With the rapid development of internet technology, a large amount of company public opinion information is generated in the network, for example, a large amount of financial news data includes company names, and internet users generate company name information with abbreviated names as main styles in web page texts, so that if more comprehensive information related to enterprises is to be timely and accurately acquired from the internet, the abbreviation of the enterprise needs to be identified. The effective public opinion processing system can process company name data in different forms, thereby providing support for analysis, research and decision of various businesses.

The existing enterprise abbreviation extraction process mainly adopts an algorithm based on statistics, the method needs to manually mark a large amount of corpora for training, and the corpora has large feature scale, higher cost and lower accuracy.

Disclosure of Invention

In order to solve the problems mentioned in the background art, the invention provides an enterprise abbreviation extraction method, an enterprise abbreviation extraction device, a computer device and a storage medium, which can effectively improve the efficiency and accuracy of enterprise abbreviation extraction. The embodiment of the invention provides the following specific technical scheme:

in a first aspect, a method for extracting enterprise abbreviation is provided, the method comprising:

acquiring an enterprise name;

matching the enterprise names according to a plurality of preset entity dictionaries and a plurality of regular expressions according to the matching priority order of the regular expressions, wherein the entity dictionaries do not contain enterprise name entities;

acquiring an entity sequence matched with the enterprise name, and identifying a name entity of the enterprise name according to the entity category of each entity in the entity sequence;

and checking the name entity, and if the checking is successful, determining the name entity as the name of the enterprise for short.

Further, the matching the enterprise names according to a plurality of preset entity dictionaries and a plurality of regular expressions and the matching priority order of each regular expression includes:

traversing each regular expression in sequence according to the matching priority order of each regular expression;

in the traversal process, if the currently traversed regular expression is combined with a plurality of entity dictionaries to successfully match the enterprise names to obtain an entity sequence, the traversal is stopped, and if not, the traversal is continued until the matching is successful.

Further, the checking the name entity includes:

acquiring the total word number of the name entity;

judging whether the total word number of the name entity is larger than a first preset word number and smaller than a second preset word number or not;

if so, the verification is successful, otherwise, the verification fails.

Further, the method further comprises:

if the total word number of the name entity is judged to be not smaller than the second preset word number, matching the name entity according to the entity dictionaries and the regular expressions and the matching priority sequence of the regular expressions;

judging whether the entity sequence of the name entity is successfully matched;

if so, identifying the abbreviation of the enterprise name from the entity sequence of the name entity;

and if not, screening the short names of the enterprise names from the name entities, and supplementing the remaining words in the name entities into corresponding entity dictionaries based on a Bootstrapping algorithm.

Further, the method further comprises:

and if the total word number of the name entity is judged to be not more than the first preset word number, splicing the name entity with a previous entity or a next entity of the name entity in the entity sequence, and determining the splicing result as the name of the enterprise for short.

Further, the method further comprises:

and correspondingly storing the enterprise name and the short name of the enterprise name into a database.

Further, the method further comprises the step of pre-constructing a plurality of entity dictionaries:

constructing an enterprise name sample library;

extracting each enterprise name in the enterprise name sample library through an N-Gram algorithm to obtain a region name, an industry name and an enterprise type;

respectively taking the area name, the industry name and the enterprise type as entities, and correspondingly constructing an area dictionary, an industry dictionary and an enterprise type dictionary;

and performing word segmentation processing on each enterprise name in the enterprise name sample library through a word segmentation algorithm, and supplementing the region dictionary, the industry dictionary and the enterprise type dictionary according to word segmentation results.

In a second aspect, an enterprise abbreviation extraction apparatus is provided, the apparatus including:

the acquisition module is used for acquiring enterprise names;

the first matching module is used for matching the enterprise names according to a plurality of preset entity dictionaries and a plurality of regular expressions and according to the matching priority order of each regular expression, wherein each entity dictionary does not contain an enterprise name entity;

the identification module is used for acquiring an entity sequence matched with the enterprise name and identifying a name entity of the enterprise name according to the entity category of each entity in the entity sequence;

the checking module is used for checking the name entity;

and the determining module is used for determining the name entity as the name of the enterprise for short when the verification module successfully verifies.

Further, the first matching module is specifically configured to:

Further, the verification module is specifically configured to:

acquiring the total word number of the name entity;

if so, the verification is successful, otherwise, the verification fails.

Further, the apparatus further includes a second matching module, and the second matching module is specifically configured to:

if the verification module judges that the total word number of the name entity is not less than the second preset word number, matching the name entity according to the entity dictionary and the regular expressions and the matching priority sequence of the regular expressions;

the determining module is specifically further configured to:

judging whether the second matching module successfully matches the entity sequence of the enterprise name;

Further, the determining module is specifically configured to:

and if the checking module judges that the total word number of the name entity is not greater than the first preset word number, splicing the name entity with a previous entity or a next entity of the name entity in the entity sequence, and determining the splicing result as the name of the enterprise for short.

Further, the apparatus further includes a saving module, and the saving module is specifically configured to:

Further, the apparatus further comprises a construction module, which is specifically configured to:

constructing an enterprise name sample library;

In a third aspect, a computer device is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the processor implements the enterprise abbreviation extraction method according to any one of the first aspect.

In a fourth aspect, a computer-readable storage medium is provided, where the computer-readable storage medium stores a computer program, and the computer program, when executed by a processor, implements the enterprise abbreviation extraction method according to any one of the first aspect.

The technical scheme provided by the invention at least has the following beneficial effects:

the embodiment of the invention provides an enterprise abbreviation extracting method, an enterprise abbreviation extracting device, computer equipment and a storage medium.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a flowchart of an enterprise abbreviation extraction method provided in an embodiment of the present invention;

FIG. 2 is a flowchart of constructing an entity dictionary according to an embodiment of the present invention;

FIG. 3 is a block diagram of an enterprise abstraction device according to an embodiment of the present invention;

fig. 4 is a block diagram of a computer device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It is to be understood that, unless the context clearly requires otherwise, throughout the description and the claims, the words "comprise", "comprising", and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is, what is meant is "including, but not limited to".

Furthermore, in the description of the present invention, it is to be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In addition, in the description of the present invention, "a plurality" means two or more unless otherwise specified.

As described in the foregoing background art, in the existing enterprise abbreviation extraction process, an algorithm based on statistics is mainly used, and this method needs to manually label a large amount of corpora for training, so that the corpora has a large feature scale, high cost, and low accuracy. Therefore, the embodiment of the invention provides an enterprise abbreviation extracting method, which is used for extracting the abbreviation of an enterprise name corresponding to the enterprise name by matching the enterprise name through a dictionary and a regular expression according to the matching priority sequence of the expression, so that not only can the key name information of a standard company in the name form be accurately extracted, but also the key name information of a non-standard company in the name form be accurately extracted, and a data basis is provided for technologies such as enterprise entity identification, public opinion risk analysis based on the company name and the like.

Example one

The embodiment of the invention provides an enterprise abbreviation extracting method, which is exemplified by being applied to an enterprise abbreviation extracting device, and the device can be configured in any computer equipment so that the computer equipment can execute the enterprise abbreviation extracting method. Referring to fig. 1, the method may include the steps of:

101, acquiring the name of the enterprise.

The business name can be a name in a four-segment normalized form, namely < area > < name > < industry > < type >, the standard form is represented as different fields from left to right, each tip bracket represents the entity type of the field, and the name field is short for the business name. Furthermore, the business name may also be a company name in the form of a non-normalized name, for example, only a part of an entity category in a standard form, for example, < name > < industry > < type >, or a plurality of fields included belonging to the same entity category, for example, < region > < name > < industry > < type >, etc.

Specifically, the enterprise name may be obtained from an enterprise name database opened by the industry and commerce department, which is not specifically limited in this embodiment.

And 102, matching the enterprise names according to a plurality of preset entity dictionaries and a plurality of regular expressions according to the matching priority order of each regular expression, wherein each entity dictionary does not contain an enterprise name entity.

Each entity dictionary can be pre-constructed based on the enterprise name sample corpus, and comprises a region dictionary, an industry dictionary and an enterprise type dictionary. Specifically, the region dictionary may be divided into a provincial region dictionary, a city region dictionary, and a prefecture region dictionary according to the administrative division level, and the industry dictionary is divided into a two-word industry dictionary, a three-word industry dictionary, and a four-word industry dictionary according to the number of words of dictionary elements.

The method comprises the steps of carrying out data exploration on an enterprise name sample in an enterprise name sample library in advance, determining a non-standard form enterprise name except a four-section type standard form enterprise name (< region > < name > < industry > < type >), and designing a plurality of regular expression field forms for matching different forms of enterprise names.

In a specific application, besides being used for matching regular expressions in a four-section specification form, more than 80 regular expression field forms for matching are also designed, wherein the more than 80 regular expression field forms comprise: the method comprises the steps of changing the position of an entity category included in a four-section type standard form (for example, < name > < area > < industry > < type >), adding a plurality of fields (for example, < area > < name > < industry > < type >) and only including part of entity categories (for example, < name > < industry > < type >) and the like to the same entity category, and by designing various regular expression field forms, the coverage rate of enterprise name matching can be improved conveniently, and the accuracy rate of subsequent enterprise name extraction is guaranteed.

Specifically, the regular expressions can be traversed sequentially according to the matching priority order of the regular expressions; in the traversal process, if the currently traversed regular expression is combined with a plurality of entity dictionaries to successfully match the enterprise names to obtain an entity sequence, the traversal is stopped, and if not, the traversal is continued until the matching is successful.

The matching priority order of the regular expressions can be set as:

starting matching from the regular expression with the largest field number, preferentially matching the regular expressions with the changed entity type sequence, if the matching is unsuccessful, preferentially matching the regular expressions with only lacking region fields in the process of descending the number of the entity types, and when the industry entity type matching is carried out, the matching priority of the regular expression with the four-word line element field is higher than that of the regular expression with the two-word line element fields, and analogizing according to the sequence until the entity sequence is successfully matched with the enterprise name.

For example, for the enterprise name "XX asset assessment (shanghai) limited", according to the matching priority order, the regular expression "< name > < industry > < area > < type >" should be preferentially matched to obtain the enterprise "XX", and for a correct result, if the pattern is preferentially matched to the "< name > < area > < type >", the enterprise "XX asset assessment" is obtained, which results in that the extraction result is not sufficiently simplified.

In the embodiment, the enterprise names are matched according to the preset entity dictionaries and the regular expressions and the matching priority sequence of each regular expression, so that the efficiency and the accuracy of the matching algorithm can be improved.

And 103, acquiring an entity sequence matched with the enterprise name, and identifying the name entity of the enterprise name according to the entity category of each entity in the entity sequence.

The sequencing position of each entity included in the entity sequence corresponds to the position of each entity in the business name, and the entities included in the entity sequence do not overlap with each other in the business name.

And 104, checking the name entity, and if the checking is successful, determining the name entity as the name of the enterprise for short.

Specifically, the process may include:

and acquiring the total word number of the name entity, and judging whether the total word number of the name entity is greater than a first preset word number and less than a second preset word number, if so, the verification is successful, otherwise, the verification fails.

The first preset word number and the second preset word number may be set according to actual needs, for example, the value of the first preset word number may be set to 1, and the value of the second preset word number may be set to 6.

In this embodiment, by verifying the identified name entity, the name entity is determined as the abbreviation of the enterprise name in the case of successful verification, and the extraction accuracy of the abbreviation of the enterprise can be improved.

In one example, referring to fig. 2, the entity dictionaries in step 101 may be constructed as follows, including the following steps:

and 201, constructing an enterprise name sample library.

Specifically, data cleaning is carried out on an original enterprise name corpus, non-enterprise name data, numbers and other non-standard symbols are removed, an enterprise name sample library is obtained, and sample data are extracted from the enterprise name sample library for data inspection.

The original enterprise corpus includes company names in normalized name form and company names in non-normalized name form, and cleaning work is required before processing, including but not limited to:

1) removing data of non-company names, such as company names which are not recorded fully or are composed of all numbers;

2) establishing a stop word dictionary, namely collecting fields which do not contain the keywords for short for the company clearly, such as ' Changdu region ', ' Korean family ', special symbols including ' and the like, deleting corresponding fields from the company name, and carrying out primary processing on data;

after data cleaning is carried out on the original enterprise name corpus, an enterprise name sample library is obtained, and partial enterprise name sample data is extracted for data exploration.

202, extracting the area name, the industry name and the enterprise type from the enterprise name sample library through an N-Gram algorithm.

In specific application, the construction of a region dictionary can be combined with external region name data, and the region dictionary is divided into three region name dictionaries of provincial level, city level and prefecture level according to administrative regions so as to carry out more accurate matching; the industry dictionary is divided into a two-character industry word, a three-character industry word and a four-character industry word dictionary according to the word number of the element, and the duplication removing processing including the two-character industry word element is carried out in the four-character industry word element, so that the subsequent algorithm efficiency is improved.

Specifically, a region name, an industry name and an enterprise type are extracted from an enterprise name sample in an enterprise name sample library through a bi-gram model, a tri-gram model and a 4-gram model, and a region dictionary, an industry dictionary and an enterprise type dictionary are correspondingly established.

Wherein the N-Gram is based on an assumption: the nth word occurrence is related to the first n-1 words and not to any other words, and the probability of the entire sentence occurrence is equal to the product of the probabilities of the respective words. Assuming that the sentence T is composed of word sequences w1, w2, w3 and … wn, the probability of each word can be obtained by statistical calculation in the corpus:

P(wi)＝N(wi)/(N(w1)+N(w2)+N(w3)+…+N(wn))；

and after the probability of each word is sorted in a descending order, entities corresponding to the three dictionaries are screened and extracted.

And 203, correspondingly constructing a region dictionary, an industry dictionary and an enterprise type dictionary by respectively taking the region name, the industry name and the enterprise type as dictionary elements.

And 204, performing word segmentation processing on the enterprise name sample library through a preset word segmentation algorithm, and supplementing a region dictionary, an industry dictionary and an enterprise type dictionary according to word segmentation results.

The HMM model-based Chinese word segmentation technology carries out word segmentation on the enterprise name sample library.

In the embodiment, three entity dictionaries of regions, industries and enterprise types are established through the N-Gram model based on the enterprise name sample library, and the dictionary is supplemented by combining the Chinese text word segmentation technology, so that entities in each entity dictionary are more comprehensive, and the accuracy of subsequent enterprise abbreviation is improved.

In one example, the method may further comprise:

if the total word number of the name entity is judged to be not less than the second preset word number, matching the name entity according to the entity dictionaries and the regular expressions and the matching priority sequence of the regular expressions;

judging whether the entity sequence of the name entity is successfully matched;

if not, screening out the short names of the enterprise names from the name entities, and supplementing the residual words in the name entities into the corresponding entity dictionary based on a Bootstrapping algorithm.

In the case that the verification in step 104 fails, for example, the total number of the identified name entities is too many, which may be that a field containing a non-abbreviated name in the name entity includes an industry field, for example, the industry field is included, but the industry field is not included in an industry dictionary, because the number of fields containing an industry entity exceeds the matching maximum value N (for example, N is set to 3), at this time, the name entity may be secondarily matched by using a regular expression in combination with the dictionary, and if an entity sequence of the name entity is matched, a final abbreviation of the name of the enterprise is identified from the entity sequence of the name entity. If the entity sequence of the name entity cannot be matched, further screening the abbreviation of the enterprise name from the name entity in a manual screening mode, and supplementing the remaining words in the name entity except the abbreviation of the enterprise into a corresponding entity dictionary based on a Bootstrapping algorithm so as to carry out enterprise name matching by combining the updated entity dictionary with a regular expression in the following process. Therefore, the accuracy of enterprise short extraction can be further improved.

For example, the extracted short company is not simple enough due to the incomplete pre-constructed entity dictionary, and if the industry dictionary does not contain "nano", the short company of the "future nano technology limited company" is extracted as "future nano", which results in an error result that the extracted short company is not simple enough. At the moment, a manual screening mode is needed to screen out the future in the future nano as an enterprise abbreviation, and the nano is supplemented to a corresponding industry dictionary based on a Bootstrapping algorithm.

In one example, the method may further comprise:

and if the total word number of the name entity is judged to be not more than the first preset word number, splicing the name entity with the previous entity or the next entity of the name entity in the entity sequence, and determining the splicing result as the name of the enterprise for short.

In the case that the verification in step 104 fails, for example, the total number of the identified name entities is too few, which may be caused by mismatching fields containing industry entities in the short term of the enterprise due to ambiguity problems, and the short term of the enterprise is not complete, for example, because the "culture" belongs to a two-word industry element dictionary, the "culture" in the "beijing love culture limited company" is determined as the industry entity when matching, so that the obtained result of the short term of the enterprise is "love", and the extracted short term of the enterprise has an erroneous result, and thus the "love" and the culture of the next entity are spliced into "love culture", so that the correct extracted result of the short term of the enterprise is obtained.

Example two

The embodiment of the present invention provides an enterprise abbreviation extracting apparatus, which can be configured in any computer device, so that the computer device can execute the enterprise abbreviation extracting method provided in the above embodiment. The computer devices may be configured as various terminals, such as servers, which may be implemented as a single service or a cluster of servers.

Referring to fig. 3, the apparatus may include:

an obtaining module 31, configured to obtain a name of an enterprise;

the first matching module 32 is configured to match the enterprise names according to a matching priority order of each regular expression according to a plurality of preset entity dictionaries and a plurality of regular expressions, where each entity dictionary does not include an enterprise name entity;

the identifying module 33 is configured to obtain an entity sequence matched with the enterprise name, and identify a name entity of the enterprise name according to an entity type of each entity in the entity sequence;

a checking module 34, configured to check the name entity;

and the determining module 35 is configured to determine the name entity as the name of the enterprise for short when the verification module successfully verifies the name entity.

In one example, the first matching module 32 is specifically configured to:

traversing all regular expressions in sequence according to the matching priority order of all regular expressions;

Further, the checking module 34 is specifically configured to:

acquiring the total word number of the name entity;

if so, the verification is successful, otherwise, the verification fails.

In one example, the apparatus further comprises a second matching module 36, and the second matching module 36 is specifically configured to:

if the checking module 34 determines that the total word number of the name entity is not less than the second preset word number, matching the name entity according to the entity dictionaries and the regular expressions and the matching priority order of the regular expressions;

the determining module 35 is further specifically configured to:

judging whether the second matching module 36 successfully matches the entity sequence of the enterprise name;

In one example, the determining module 35 is specifically configured to:

if the checking module 34 determines that the total number of words of the name entity is not greater than the first preset number of words, the name entity is spliced with a previous entity or a next entity of the name entity in the entity sequence, and the splicing result is determined to be the name of the enterprise for short.

In an example, the apparatus further includes a saving module 37, where the saving module 37 is specifically configured to:

In one example, the apparatus further comprises a building module 30, the building module 30 being specifically configured to:

constructing an enterprise name sample library;

and performing word segmentation processing on each enterprise name in the enterprise name sample library through a word segmentation algorithm, and supplementing a region dictionary, an industry dictionary and an enterprise type dictionary according to word segmentation results.

It should be noted that: in the enterprise abbreviation extracting apparatus provided in this embodiment, only the division of each functional module is exemplified, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the apparatus is divided into different functional modules, so as to complete all or part of the above described functions. In addition, for specific implementation processes and beneficial effects of the enterprise abbreviation extracting device in this embodiment, reference is made to the enterprise abbreviation extracting method in the embodiment, and details are not described here.

Fig. 4 is an internal structural diagram of a computer device according to an embodiment of the present invention. The computer device may be a server, and its internal structure diagram may be as shown in fig. 4. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method for enterprise abbreviation extraction.

Those skilled in the art will appreciate that the configuration shown in fig. 4 is a block diagram of only a portion of the configuration associated with aspects of the present invention and is not intended to limit the computing devices to which aspects of the present invention may be applied, and that a particular computing device may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, there is also provided a computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:

acquiring an enterprise name;

matching the enterprise names according to a plurality of preset entity dictionaries and a plurality of regular expressions according to the matching priority order of each regular expression, wherein each entity dictionary does not contain an enterprise name entity;

acquiring an entity sequence matched with the enterprise name, and identifying a name entity of the enterprise name according to the entity type of each entity in the entity sequence;

In one embodiment, there is also provided a computer readable storage medium having a computer program stored thereon, the computer program when executed by a processor implementing the steps of:

acquiring an enterprise name;

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, databases, or other media used in embodiments provided herein may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above examples only show some embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. An enterprise abbreviation extraction method, the method comprising:

acquiring an enterprise name;

2. The method according to claim 1, wherein the matching the business names according to a preset plurality of entity dictionaries and a plurality of regular expressions and according to the matching priority order of each regular expression comprises:

3. The method of claim 1, wherein the verifying the name entity comprises:

acquiring the total word number of the name entity;

if so, the verification is successful, otherwise, the verification fails.

4. The method of claim 3, further comprising:

judging whether the entity sequence of the name entity is successfully matched;

5. The method of claim 3, further comprising:

6. The method according to any one of claims 1 to 5, further comprising the step of pre-constructing a plurality of said entity dictionaries:

constructing an enterprise name sample library;

7. An enterprise abbreviation extraction device, the device comprising:

the acquisition module is used for acquiring enterprise names;

the checking module is used for checking the name entity;

8. The apparatus of claim 7, wherein the matching module is specifically configured to:

9. A computer device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the enterprise abbreviation extraction method of any one of claims 1 to 6 when executing the computer program.

10. A computer-readable storage medium, which stores a computer program, wherein the computer program, when executed by a processor, implements the enterprise abbreviation extraction method according to any one of claims 1 to 6.