CN112613299A - Method and device for constructing enterprise synonym library and electronic equipment - Google Patents

Method and device for constructing enterprise synonym library and electronic equipment Download PDF

Info

Publication number
CN112613299A
CN112613299A CN202011573431.9A CN202011573431A CN112613299A CN 112613299 A CN112613299 A CN 112613299A CN 202011573431 A CN202011573431 A CN 202011573431A CN 112613299 A CN112613299 A CN 112613299A
Authority
CN
China
Prior art keywords
enterprise
name
abbreviation
corresponding relation
full
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011573431.9A
Other languages
Chinese (zh)
Inventor
任亮
傅雨梅
文齐辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhiyin Intelligent Technology Co ltd
Original Assignee
Beijing Zhiyin Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhiyin Intelligent Technology Co ltd filed Critical Beijing Zhiyin Intelligent Technology Co ltd
Priority to CN202011573431.9A priority Critical patent/CN112613299A/en
Publication of CN112613299A publication Critical patent/CN112613299A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method and a device for constructing an enterprise synonym library and electronic equipment, wherein the method comprises the following steps: acquiring a first corresponding relation between the enterprise full name and the enterprise short name which are manually input, and/or acquiring a second corresponding relation between the enterprise full name and the enterprise short name which are acquired by a data acquisition system according to a first preset period; acquiring a total enterprise name according to a second preset period, extracting the total enterprise name for short by adopting a TF-IDF (TransFlash-IDF) extraction algorithm for short, acquiring an enterprise name corresponding to each enterprise name in the total enterprise name, and further acquiring a third corresponding relation; and constructing an enterprise synonym library based on the first corresponding relation and/or the second corresponding relation and the third corresponding relation. The method does not need to collect a large number of knowledge base samples or text samples, saves time and resources, is specially researched aiming at the field of synonyms of enterprises, and has simple algorithm and good accuracy.

Description

Method and device for constructing enterprise synonym library and electronic equipment
Technical Field
The invention relates to the technical field of data processing, in particular to a method and a device for constructing an enterprise synonym library and electronic equipment.
Background
News opinions contain a large number of enterprise instant events and related comments, and when people need to collect the information, the problem of entity alignment is often encountered. Because a large number of short names and nicknames exist in enterprise entity names appearing in news public sentiments, data and information are difficult to be effectively fused with other systems, and an effective entity alignment method is urgently needed. Therefore, it is necessary to establish a synonym library for converting the non-standard names of enterprises into standard names, and map the enterprise short names and nicknames (which may be collectively referred to as enterprise short names) with the standard enterprise names (which may be simply referred to as enterprise full names) one by one, so as to effectively integrate enterprise data and comprehensively know enterprise information.
And a mapping relation between the enterprise short name and the nickname and the standard enterprise name is established, which is beneficial to positioning the main body of news public opinion. After the news public opinion occurs, business entities (such as short names and nicknames) with the news public opinion structured are matched with the standard business names through the synonym library to determine the main body of the news public opinion.
However, there is no general scheme for the professional field of the enterprise synonym library, and the mainstream scheme is focused on the following general fields. The current mainstream entity alignment method is a method for performing similarity matching according to structure information or entity attribute characteristics of a knowledge graph formed by the relationship between entities. The method usually depends on a large number of entity attributes or text information (news, public opinions, various documents and the like) acquired from a knowledge base (encyclopedia, interactive encyclopedia and the like), is a general method for solving entity alignment, and is not optimized for the direction of synonyms of enterprises.
Therefore, although the entity alignment method is relatively general, for the field of enterprise synonyms, a large number of knowledge base samples or text samples need to be collected, resources are consumed, time consumption is long, algorithms are complex, and in addition, because the entity alignment method is a general method and is not a study specially performed for the field of enterprise synonyms, the scheme is difficult to cover the whole number of industrial and commercial enterprises when being used for the field of enterprise synonyms, and accuracy is poor.
In summary, in the prior art, the construction method of the enterprise synonym library has the technical problems of long time consumption, complex algorithm and poor accuracy.
Disclosure of Invention
In view of this, the present invention provides a method and an apparatus for constructing an enterprise synonym library, and an electronic device, so as to alleviate the technical problems of long time consumption, complex algorithm, and poor accuracy of the existing method for constructing an enterprise synonym library.
In a first aspect, an embodiment of the present invention provides a method for constructing an enterprise synonym library, including:
acquiring a first corresponding relation between the enterprise full name and the enterprise short name which are manually input, and/or acquiring a second corresponding relation between the enterprise full name and the enterprise short name which are acquired by a data acquisition system according to a first preset period;
acquiring a total enterprise name according to a second preset period, extracting the total enterprise name for short by adopting a TF-IDF (TransFlash-IdF) short extraction algorithm to obtain an enterprise short corresponding to each enterprise name in the total enterprise name, and further obtaining a third corresponding relation;
and constructing the enterprise synonym library based on the first corresponding relation, the second corresponding relation and the third corresponding relation.
Further, the short name extraction of the total enterprise name is carried out by adopting a TF-IDF short name extraction algorithm, and the method comprises the following steps:
performing word segmentation processing on the full enterprise titles to obtain a plurality of word segments corresponding to each enterprise title;
counting the word frequency of each word in the multiple words in the full-scale enterprise full-scale;
acquiring an administrative division name, and extracting each enterprise full name in the full amount of enterprise full names for short based on the administrative division name and the word frequency to obtain an initial third corresponding relation;
and carrying out post-processing on the initial third corresponding relation to obtain the third corresponding relation.
Further, performing short-term extraction on each enterprise full name in the full-volume enterprise full names based on the administrative division names and the word frequencies, including:
acquiring a current enterprise full scale in the full-scale enterprise full scales;
removing the administrative division names in the current enterprise full name according to the administrative division names to obtain enterprise names with the administrative division names removed;
performing word segmentation on the enterprise name without the administrative division name to obtain a plurality of first words corresponding to the enterprise name without the administrative division name;
determining the universality of each first participle based on the word frequency of each first participle in the plurality of first participles, and further removing the first participle with the universality of high-frequency words in the plurality of first participles to obtain the rest first participles;
if the universality of each of the remaining first participles is a low-frequency word, combining the remaining first participles to obtain an enterprise abbreviation corresponding to the current enterprise full name;
and if the universality of each of the remaining first participles is not the low-frequency word, extracting the enterprise abbreviation from the remaining first participles according to the characteristics of the enterprise abbreviation, and further obtaining the enterprise abbreviation corresponding to the current enterprise full name.
Further, extracting the short names of enterprises from the remaining first participles according to the characteristics of the short names of enterprises, comprising:
extracting a preset number of word segmentation with the lowest word frequency from the rest first word segmentation to form an initial enterprise abbreviation;
determining whether the word number length of the initial enterprise abbreviation is within a preset length range;
if the length is within the preset length range, judging whether a word-crossing condition exists among a preset number of word segments in the initial enterprise abbreviation;
and if the word-crossing condition does not exist, taking the initial enterprise abbreviation as the enterprise abbreviation corresponding to the current enterprise full name.
Further, the method further comprises:
if the business name is not in the preset length range, returning to execute the step of performing word segmentation processing on the business name without the administrative division name;
and if the word number length of the initial enterprise abbreviation obtained after the step of performing word segmentation processing on the enterprise name without the administrative division name is returned is still not within the preset length range, taking the initial enterprise abbreviation as the enterprise abbreviation corresponding to the current enterprise full name.
Further, the method further comprises:
if the word-crossing condition exists, determining whether the word-crossing condition of the initial enterprise abbreviation meets a preset condition;
if the preset conditions are met, taking the initial enterprise abbreviation as an enterprise abbreviation corresponding to the current enterprise full name;
if the preset condition is not met, returning to execute the step of performing word segmentation processing on the enterprise name without the administrative division name;
and if the word-crossing condition of the initial enterprise abbreviation does not meet the preset condition after the step of performing word segmentation processing on the enterprise name without the administrative division name is returned, taking the initial enterprise abbreviation as the enterprise abbreviation corresponding to the current enterprise full name.
Further, post-processing the initial third correspondence relationship includes:
determining that a third corresponding relation of the same enterprise abbreviation exists in the initial third corresponding relation;
and adding preset participles to the same enterprise abbreviations in the third corresponding relation with the same enterprise abbreviations to further obtain the third corresponding relation.
Further, constructing the enterprise synonym library based on the first corresponding relationship and/or the second corresponding relationship and the third corresponding relationship includes:
fusing the first corresponding relation and/or the second corresponding relation and the third corresponding relation according to a preset priority order to obtain an initial enterprise synonym library;
if the condition that the enterprise full names correspond to the multiple repeated enterprise short names exists in the initial enterprise synonym library, determining a target enterprise short name as the enterprise short name corresponding to the enterprise full name in the multiple repeated enterprise short names according to the preset priority sequence, and further obtaining the enterprise synonym library;
and if the condition that the enterprise full names correspond to a plurality of repeated enterprise short names does not exist in the initial enterprise synonym library, taking the initial enterprise synonym library as the enterprise synonym library.
In a second aspect, an embodiment of the present invention further provides a device for constructing an enterprise synonym library, including:
the acquisition unit is used for acquiring a first corresponding relation between the enterprise full name and the enterprise short name which are manually input, and/or acquiring a second corresponding relation between the enterprise full name and the enterprise short name which are acquired by the data acquisition system according to a first preset period;
the abbreviation extraction unit is used for acquiring a total enterprise abbreviation according to a second preset period, extracting the total enterprise abbreviation by adopting a TF-IDF abbreviation extraction algorithm, and obtaining an enterprise abbreviation corresponding to each enterprise abbreviation in the total enterprise abbreviation so as to obtain a third corresponding relation;
and the construction unit is used for constructing the enterprise synonym library based on the first corresponding relation and/or the second corresponding relation and the third corresponding relation.
In a third aspect, an embodiment of the present invention provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the method according to any one of the above first aspects when executing the computer program.
In an embodiment of the present invention, a method for constructing an enterprise synonym library is provided, including: acquiring a first corresponding relation between the enterprise full name and the enterprise short name which are manually input, and/or acquiring a second corresponding relation between the enterprise full name and the enterprise short name which are acquired by a data acquisition system according to a first preset period; acquiring a total enterprise name according to a second preset period, extracting the total enterprise name for short by adopting a TF-IDF (TransFlash-IDF) extraction algorithm for short, acquiring an enterprise name corresponding to each enterprise name in the total enterprise name, and further acquiring a third corresponding relation; and finally, constructing an enterprise synonym library based on the first corresponding relation and/or the second corresponding relation and the third corresponding relation. According to the construction method of the enterprise synonym library, a large number of knowledge base samples or text samples are not required to be collected, time and resources are saved, the method is specially researched aiming at the field of enterprise synonyms, the algorithm is simple and good in accuracy, and the technical problems that the existing construction method of the enterprise synonym library is long in time consumption, complex in algorithm and poor in accuracy are solved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a schematic flow chart of a method for constructing an enterprise synonym library according to an embodiment of the present invention;
FIG. 2 is a flowchart of a method for extracting a total enterprise name by using a TF-IDF abbreviation extraction algorithm according to an embodiment of the present invention;
fig. 3 is a flowchart of a method for extracting each enterprise full name from the full-scale enterprise full names based on the administrative division names and the word frequencies according to the embodiment of the present invention;
fig. 4 is a flowchart of a method for extracting an enterprise abbreviation from remaining first participles according to characteristics of the enterprise abbreviation provided in an embodiment of the present invention;
fig. 5 is a flowchart of a method for constructing an enterprise synonym library based on the first corresponding relationship and/or the second corresponding relationship, and the third corresponding relationship according to the embodiment of the present invention;
FIG. 6 is an overall architecture diagram of a method for constructing an enterprise thesaurus according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of a part of an enterprise synonym library obtained by the method for constructing an enterprise synonym library according to the embodiment of the present invention;
fig. 8 is a schematic diagram of an apparatus for constructing an enterprise synonym library according to an embodiment of the present invention.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the following embodiments, and it should be understood that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to facilitate understanding of the embodiment, a method for constructing an enterprise synonym library disclosed in the embodiment of the present invention is first described in detail.
The first embodiment is as follows:
to facilitate understanding of the embodiment, first, a method for constructing an enterprise synonym library disclosed in the embodiment of the present invention is described in detail, referring to a flow diagram of a method for constructing an enterprise synonym library shown in fig. 1, which mainly includes the following steps:
step S102, acquiring a first corresponding relation between the enterprise full name and the enterprise short name which are manually input, and/or acquiring a second corresponding relation between the enterprise full name and the enterprise short name which are acquired by a data acquisition system according to a first preset period;
in the embodiments of the present invention, synonyms refer to words having the same meaning as the target word. The enterprise synonym refers to enterprise short name, nickname and the like which represent the same enterprise with the enterprise full name. The enterprise short name, nickname and the like are collectively called as enterprise short name in the application.
The manual input may be input by a customer manager, or may be input by other persons having authority. The first preset period may be once a day, and the first preset period is not particularly limited in the embodiment of the present invention. The second correspondence between the enterprise full names and the enterprise short names collected by the data collection system may refer to a correspondence between the enterprise full names and the enterprise short names disclosed by the enterprise, and the enterprise may disclose the correspondence through various channels, which mainly includes: channels of professional websites (for example, investment and financing disclosures, industry associations, transaction platforms, trademark information, etc.), channels of encyclopedias, channels of enterprise official networks, channels of stock names, etc., but the number of second correspondences acquired by the acquisition system is small, and cannot cover all enterprise names, so that it is necessary to extract short names (refer to the content of step S104 below) in combination with other algorithms (for example, the TF-IDF short name extraction algorithm of the present application), and further construct a comprehensive enterprise synonym library.
Step S104, acquiring a total enterprise name according to a second preset period, extracting the total enterprise name for short by adopting a TF-IDF (TransFlash-IdF) short extraction algorithm, obtaining an enterprise short corresponding to each enterprise name in the total enterprise name, and further obtaining a third corresponding relation;
the correspondence between the enterprise full names and the enterprise short names obtained in the step S102 is limited, and most of the correspondence between the enterprise full names and the enterprise short names needs to be calculated by adopting a TF-IDF short name extraction algorithm.
The following description will first describe the TF-IDF algorithm, which is a commonly used weighting technique for information retrieval and data mining. TF means Term Frequency (Term-Frequency), and IDF means Inverse Document Frequency (Inverse Document Frequency). TF-IDF is a conventional statistical algorithm used to evaluate how important a word is in a document set for a certain document. It is proportional to the word frequency of this word in the current document and inversely proportional to the other word frequencies in the document set. The TF-IDF algorithm is commonly used to extract the abstract of an article.
In the embodiment of the invention, the inventor improves the TF-IDF algorithm into an extraction algorithm for TF-IDF, and the algorithm is described in detail hereinafter, and is not described again.
It should be noted that the second preset period may be the same as the first preset period, or may be different from the first preset period, and the second preset period is not specifically limited in the embodiment of the present invention.
Additionally, the full enterprise title may be the standard name of the enterprise obtained from the bureau of industry + securities.
And S106, constructing an enterprise synonym library based on the first corresponding relation and/or the second corresponding relation and the third corresponding relation.
After the first corresponding relation and/or the second corresponding relation and the third corresponding relation are obtained, the enterprise synonym library can be constructed based on the obtained corresponding relations, and the enterprise synonym library finally obtained by the method is more accurate.
In an embodiment of the present invention, a method for constructing an enterprise synonym library is provided, including: acquiring a first corresponding relation between the enterprise full name and the enterprise short name which are manually input, and/or acquiring a second corresponding relation between the enterprise full name and the enterprise short name which are acquired by a data acquisition system according to a first preset period; acquiring a total enterprise name according to a second preset period, extracting the total enterprise name for short by adopting a TF-IDF (TransFlash-IDF) extraction algorithm for short, acquiring an enterprise name corresponding to each enterprise name in the total enterprise name, and further acquiring a third corresponding relation; and finally, constructing an enterprise synonym library based on the first corresponding relation and/or the second corresponding relation and the third corresponding relation. According to the construction method of the enterprise synonym library, a large number of knowledge base samples or text samples are not required to be collected, time and resources are saved, the method is specially researched aiming at the field of enterprise synonyms, the algorithm is simple and good in accuracy, and the technical problems that the existing construction method of the enterprise synonym library is long in time consumption, complex in algorithm and poor in accuracy are solved.
The above description briefly introduces the construction method of the enterprise synonym library of the present invention, and the details of the construction method are described in detail below.
In an optional embodiment of the present invention, referring to fig. 2, in step S104, a TF-IDF abbreviation extraction algorithm is used to extract the full enterprise abbreviation of the total amount, which specifically includes the following steps:
step S201, performing word segmentation processing on the total enterprise titles to obtain a plurality of words corresponding to each enterprise title;
step S202, counting the word frequency of each word in the multiple words in the total enterprise title;
step S203, acquiring administrative division names, and extracting each enterprise full name in the full amount of enterprise full names for short based on the administrative division names and word frequencies to obtain an initial third corresponding relation;
referring to fig. 3, in step S203, performing short-term extraction on each enterprise full name in the full amount of enterprise full names based on the administrative division name and the word frequency, specifically including the following steps:
step S301, obtaining the current enterprise full name in the full enterprise full names;
step S302, removing the administrative division names in the current enterprise full name according to the administrative division names to obtain enterprise names without the administrative division names;
in common enterprise acronyms, administrative division names (for example, administrative division names such as beijing, tianjin, wuhan and the like) are generally not included, the administrative division names are often regarded as high-frequency words, but the administrative division names are regarded as low-frequency words through algorithm verification, and in order to make the finally obtained enterprise acronyms consistent with the common enterprise acronyms, the administrative names need to be removed first (since the administrative division names are not high-frequency words in the algorithm, the administrative division names cannot be removed by subsequent algorithms).
Step S303, performing word segmentation processing on the enterprise name without the administrative division name to obtain a plurality of first words corresponding to the enterprise name without the administrative division name;
step S304, determining the universality of each first participle based on the word frequency of each first participle in the plurality of first participles, and further removing the first participles with the universality of high-frequency words in the plurality of first participles to obtain the rest first participles;
in the embodiment of the invention, the generality of each first participle is determined by adopting a word frequency threshold. The specific process may be that a first word frequency threshold and a second word frequency threshold may be set, and the first word frequency threshold is greater than the second word frequency threshold, and if the word frequency of a first participle is greater than the first word frequency threshold, the universality of the first participle is set as a high-frequency word; if the word frequency of a first participle is greater than a second word frequency threshold but not greater than the second word frequency threshold, setting the universality of the first participle as an intermediate frequency word; and if the word frequency of a first participle is not greater than the second word frequency threshold, setting the universality of the first participle as a low-frequency word.
It should be noted that the word frequency threshold is adjustable.
Step S305, if the universality of each first word segmentation in the remaining first words is a low-frequency word, combining the remaining first words to obtain an enterprise abbreviation corresponding to the current enterprise full name;
and when the combination is carried out, the combination is carried out according to the word sequence in the enterprise full name corresponding to each remaining first participle.
Step S306, if the universality of each of the remaining first participles is not the low-frequency word, extracting the enterprise abbreviation from the remaining first participles according to the characteristics of the enterprise abbreviation, and further obtaining the enterprise abbreviation corresponding to the current enterprise full name.
After analyzing the enterprise full name and the enterprise abbreviation, the inventor finds that the enterprise abbreviation has the following characteristics: 1) high-frequency words are not usually contained in enterprise acronyms, such as administrative divisions, limited responsibility companies and the like; 2) the enterprise full name requires uniqueness, so that a unique word segmentation in an enterprise name is generally used as an enterprise abbreviation; 3) the word segmentation for forming enterprise abbreviation is usually completely contained in enterprise full name, and can be selected in a word-crossing way, but the word-crossing way is usually not more than once, and the word-crossing distance is not large; 4) the enterprise abbreviation is usually 2-5 words, and the most common is 4 words.
Referring to fig. 4, in the step S306, performing abbreviation extraction on the remaining first segmented words according to the characteristics of the abbreviation, specifically including the following steps:
step S401, if the universality of each of the remaining first participles is not a low-frequency word, extracting a preset number of participles with the lowest word frequency from the remaining first participles to form an initial enterprise abbreviation;
the preset number of the word segmentation words can be 2, because the enterprise abbreviation is usually about 4 words, the word segmentation words are usually two words, and the low word frequency can represent the importance of the word segmentation word in the corresponding enterprise full name (the lower the word frequency is, the more representative the word is, the more important the word is) so as to extract 2 word segmentation words with the lowest word frequency from the rest first word segmentation words to form the initial enterprise abbreviation.
If two words with the lowest word frequency in the rest first participles contain intermediate-frequency words, only one word in the two words with the lowest word frequency is taken (the word is the word with the lower word frequency in the two words with the lowest word frequency).
Step S402, determining whether the word number length of the initial enterprise abbreviation is within a preset length range; if the length is within the preset length range, executing step S403; if not, executing step S405;
the preset length range may be 2 to 5 words, and the preset length range is not particularly limited in the embodiment of the present invention.
Step S403, judging whether a word-crossing condition exists among a preset number of word segments in the initial enterprise abbreviation; if no word-crossing condition exists, executing step S404; if the word-crossing condition exists, executing step S407;
step S404, taking the initial enterprise abbreviation as an enterprise abbreviation corresponding to the current enterprise full name;
step S405, returning to the step of executing word segmentation processing on the enterprise name without the administrative division name;
step S406, if the word number length of the initial enterprise abbreviation obtained after the step of performing word segmentation processing on the enterprise name without the administrative division name is returned is not within the preset length range, taking the initial enterprise abbreviation as the enterprise abbreviation corresponding to the current enterprise full name;
step S407, determining whether the word-crossing condition of the initial enterprise abbreviation meets a preset condition; if the preset condition is satisfied, executing step S408; if the preset condition is not satisfied, executing step S409;
the preset condition may be that the word-crossing distance does not exceed a preset distance.
Step S408, taking the initial enterprise abbreviation as an enterprise abbreviation corresponding to the current enterprise full name;
step S409, returning to execute the step of performing word segmentation processing on the enterprise name without the administrative division name;
step S410, if the word-crossing condition of the initial enterprise abbreviation does not satisfy the preset condition after returning to the step of performing word segmentation processing on the enterprise name without the administrative division name, taking the initial enterprise abbreviation as the enterprise abbreviation corresponding to the current enterprise full name.
And step S204, post-processing the initial third corresponding relation to obtain a third corresponding relation.
The method specifically comprises the following steps: determining that a third corresponding relation of the same enterprise abbreviation exists in the initial third corresponding relation; and adding preset participles to the same enterprise abbreviations in the third corresponding relation of the same enterprise abbreviations to further obtain the third corresponding relation.
The inventor finds that after the total enterprise names are extracted by the algorithm, the same enterprise name corresponds to multiple enterprise names, which should not be true (one enterprise name corresponds to multiple enterprise names, but one enterprise name cannot correspond to multiple enterprise names, and one enterprise name represents one enterprise).
The following is given as an example, for example: after the short names of the Beijing Wanda real estate finite responsible company, the Tianjin Wanda real estate finite responsible company and the Wuhan Wanda real estate finite responsible company are extracted according to the algorithm, the short names are Wanda, so that the Wanda short names correspond to the full names of three enterprises, the administrative division names are added back at the moment, and finally the Beijing Wanda, the Tianjin Wanda and the Wuhan Wanda are obtained.
The above-mentioned content introduces the process of extracting enterprise acronyms corresponding to enterprise global nomenclature in detail, and the following describes the process of constructing an enterprise synonym library based on the obtained correspondence in detail.
In an optional embodiment of the present invention, referring to fig. 5, in step S106, the constructing an enterprise thesaurus based on the first corresponding relationship and/or the second corresponding relationship, and the third corresponding relationship specifically includes the following steps:
step S501, performing fusion processing on the first corresponding relation and/or the second corresponding relation and the third corresponding relation according to a preset priority order to obtain an initial enterprise synonym library;
in the embodiment of the invention, the priority sequence is that the manually input priority is greater than the priority of the data acquisition system, and the priority of the data acquisition system is greater than the priority of the TF-IDF extraction algorithm for short.
The fusion process is illustrated below by way of an example:
if the first corresponding relationship of manual input is the corresponding relationship between the management and consultation limited company of Dongxi Datong (Beijing) and the minibus, the second corresponding relationship of data acquisition system acquisition is the corresponding relationship between the management and consultation limited company of Dongxi Datong (Beijing) and ofo, and the third corresponding relationship extracted by the TF-IDF short-form extraction algorithm is the corresponding relationship between the management and consultation limited company of Dongxi Datong (Beijing) and the Dongxi Datong, then when the three are fused, the enterprise corresponding to the management and consultation limited company of Dongxi Datong (Beijing) is obtained as follows: the method comprises the steps of marking a small yellow vehicle, ofo, Dongxiang Datong, wherein the small yellow vehicle is arranged at the forefront, then ofo, and finally Dongxiang Datong, and in addition, marking the enterprise for short for a source, for example, marking the small yellow vehicle for short for a source is manually input, marking ofo for short for a source is a data acquisition system, marking the Dongxiang Datong for short for a source is TF-IDF for short for a extraction algorithm, and marking can distinguish respective confidence degrees.
Step S502, if the condition that the enterprise full names correspond to the multiple repeated enterprise short names exists in the initial enterprise synonym library, determining the target enterprise short name as the enterprise short name corresponding to the enterprise full name in the multiple repeated enterprise short names according to a preset priority sequence, and further obtaining the enterprise synonym library;
for ease of understanding, the following is exemplified:
for the Wangzhou practical-stocks company, the enterprises which are manually input, collected by a data acquisition system and extracted by an algorithm are called Wangzhou stocks, and then the manually input Wangzhou stocks are selected as the enterprises called Wangzhou practical-stocks company. Although the enterprise acronyms obtained in the three ways are all the same, it seems that any one of the results is selected to be the same, and actually, the confidence of the result is influenced by the different selection, the enterprise acronym with manual input (i.e. the highest confidence) is preferably selected as the target enterprise acronym, and the source of the enterprise acronym is marked as the manual input.
Step S503, if the condition that the enterprise full names correspond to a plurality of repeated enterprise short names does not exist in the initial enterprise synonym library, the initial enterprise synonym library is used as the enterprise synonym library.
Fig. 6 is an overall architecture diagram of the construction method of the enterprise thesaurus according to the present invention. The data acquisition system acquires a second corresponding relation, sends the second corresponding relation to an enterprise synonym library in the TD database, and updates the second corresponding relation once a day; secondly, leading the total enterprise names from TeraData to Hive storage once a day; thirdly, introducing the total enterprise names from Hive to a Spark platform to extract TF-IDF for short; fourthly, extracting the TF-IDF abbreviation to obtain an enterprise abbreviation, and further obtaining a third corresponding relation; importing the obtained third corresponding relation into an enterprise synonym library once a month; sixthly, manually inputting the first corresponding relation; and seventhly, constructing an enterprise synonym library based on the three corresponding relations.
Fig. 7 is a schematic diagram of a part of the enterprise synonym library obtained by the method for constructing the enterprise synonym library according to the present invention.
The construction method of the enterprise synonym library does not need to collect a large number of enterprise attributes and disclose public opinion information, starts with enterprise full-scale, is simple and quick, is effective to full-scale industrial and commercial enterprises, and greatly expands the enterprise synonym library.
Example two:
the embodiment of the present invention further provides a device for constructing an enterprise synonym library, which is mainly used for executing the method for constructing an enterprise synonym library provided in the embodiment of the present invention, and the following describes the device for constructing an enterprise synonym library provided in the embodiment of the present invention in detail.
Fig. 8 is a schematic diagram of an apparatus for constructing an enterprise synonym library according to an embodiment of the present invention, and as shown in fig. 8, the apparatus for constructing an enterprise synonym library mainly includes: an obtaining unit 10, an extracting unit 20 and a constructing unit 30, wherein:
the acquisition unit is used for acquiring a first corresponding relation between the enterprise full name and the enterprise short name which are manually input, and/or acquiring a second corresponding relation between the enterprise full name and the enterprise short name which are acquired by the data acquisition system according to a first preset period;
the abbreviation extraction unit is used for acquiring the total enterprise abbreviation according to a second preset period, extracting the total enterprise abbreviation by adopting a TF-IDF abbreviation extraction algorithm to obtain enterprise abbreviation corresponding to each enterprise abbreviation in the total enterprise abbreviation, and further obtaining a third corresponding relation;
and the construction unit is used for constructing an enterprise synonym library based on the first corresponding relation and/or the second corresponding relation and the third corresponding relation.
In an embodiment of the present invention, a device for constructing an enterprise synonym library is provided, including: acquiring a first corresponding relation between the enterprise full name and the enterprise short name which are manually input, and/or acquiring a second corresponding relation between the enterprise full name and the enterprise short name which are acquired by a data acquisition system according to a first preset period; acquiring a total enterprise name according to a second preset period, extracting the total enterprise name for short by adopting a TF-IDF (TransFlash-IDF) extraction algorithm for short, acquiring an enterprise name corresponding to each enterprise name in the total enterprise name, and further acquiring a third corresponding relation; and finally, constructing an enterprise synonym library based on the first corresponding relation and/or the second corresponding relation and the third corresponding relation. According to the construction device of the enterprise synonym library, a large number of knowledge base samples or text samples do not need to be collected, time and resources are saved, the device is specially researched for the field of enterprise synonyms, the algorithm is simple and good in accuracy, and the technical problems that the existing construction method of the enterprise synonym library is long in time consumption, complex in algorithm and poor in accuracy are solved.
Optionally, the abbreviation extracting unit is further configured to: performing word segmentation processing on the full enterprise titles to obtain a plurality of word segments corresponding to each enterprise title; counting the word frequency of each word in the multiple words in the total enterprise call; acquiring administrative division names, and extracting each enterprise full name in the full-amount enterprise full names for short based on the administrative division names and word frequencies to obtain an initial third corresponding relation; and carrying out post-processing on the initial third corresponding relation to obtain a third corresponding relation.
Optionally, the abbreviation extracting unit is further configured to: acquiring a current enterprise full name in the full-amount enterprise full names; removing the administrative division names in the current enterprise full name according to the administrative division names to obtain enterprise names with the administrative division names removed; performing word segmentation on the enterprise name without the administrative division name to obtain a plurality of first words corresponding to the enterprise name without the administrative division name; determining the universality of each first participle based on the word frequency of each first participle in the plurality of first participles, and further removing the first participles with the universality of high-frequency words in the plurality of first participles to obtain the rest first participles; if the universality of each first word segmentation in the remaining first word segmentation is a low-frequency word, combining the remaining first word segmentation to obtain an enterprise abbreviation corresponding to the current enterprise full name; and if the generality of each first word segmentation is not the low-frequency word in the rest first words, extracting the enterprise abbreviation from the rest first words according to the characteristics of the enterprise abbreviation, and further obtaining the enterprise abbreviation corresponding to the current enterprise full name.
Optionally, the abbreviation extracting unit is further configured to: extracting a preset number of word segmentation with the lowest word frequency from the rest first word segmentation to form an initial enterprise abbreviation; determining whether the word number length of the initial enterprise abbreviation is within a preset length range; if the length is within the preset length range, judging whether a word-crossing condition exists among a preset number of word segments in the initial enterprise abbreviation; and if the word-crossing condition does not exist, taking the initial enterprise abbreviation as the enterprise abbreviation corresponding to the current enterprise full name.
Optionally, the abbreviation extracting unit is further configured to: if the length of the enterprise name is not within the preset length range, returning to execute the step of performing word segmentation processing on the enterprise name without the administrative division name; and if the word number length of the initial enterprise abbreviation obtained after the step of performing word segmentation processing on the enterprise name without the administrative division name is returned is not within the preset length range, taking the initial enterprise abbreviation as the enterprise abbreviation corresponding to the current enterprise full name.
Optionally, the abbreviation extracting unit is further configured to: if the word-crossing condition exists, determining whether the word-crossing condition of the initial enterprise short is satisfied with a preset condition; if the preset conditions are met, taking the initial enterprise abbreviation as an enterprise abbreviation corresponding to the current enterprise full name; if the preset condition is not met, returning to execute the step of performing word segmentation processing on the enterprise name without the administrative division name; and if the word-crossing condition of the initial enterprise abbreviation obtained after the step of performing word segmentation processing on the enterprise name without the administrative division name is returned does not meet the preset condition, taking the initial enterprise abbreviation as the enterprise abbreviation corresponding to the current enterprise full name.
Optionally, the abbreviation extracting unit is further configured to: determining that a third corresponding relation of the same enterprise abbreviation exists in the initial third corresponding relation; and adding preset participles to the same enterprise abbreviations in the third corresponding relation of the same enterprise abbreviations to further obtain the third corresponding relation.
Optionally, the building unit is further configured to: fusing the first corresponding relation and/or the second corresponding relation and the third corresponding relation according to a preset priority order to obtain an initial enterprise synonym library; if the condition that the enterprise full names correspond to the repeated enterprise short names exists in the initial enterprise synonym library, determining the target enterprise short names as the enterprise short names corresponding to the enterprise full names in the repeated enterprise short names according to a preset priority sequence, and further obtaining the enterprise synonym library; and if the condition that the enterprise full names correspond to a plurality of repeated enterprise short names does not exist in the initial enterprise synonym library, taking the initial enterprise synonym library as the enterprise synonym library.
The device provided by the embodiment of the present invention has the same implementation principle and technical effect as the method embodiments, and for the sake of brief description, reference may be made to the corresponding contents in the method embodiments without reference to the device embodiments.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the foregoing systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. The construction device of the enterprise synonym library provided by the embodiment of the application has the same technical characteristics as the construction method of the enterprise synonym library provided by the embodiment, so that the same technical problems can be solved, and the same technical effects can be achieved.
In addition, in the description of the embodiments of the present invention, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for constructing an enterprise synonym library is characterized by comprising the following steps:
acquiring a first corresponding relation between the enterprise full name and the enterprise short name which are manually input, and/or acquiring a second corresponding relation between the enterprise full name and the enterprise short name which are acquired by a data acquisition system according to a first preset period;
acquiring a total enterprise name according to a second preset period, extracting the total enterprise name for short by adopting a TF-IDF (TransFlash-IdF) short extraction algorithm to obtain an enterprise short corresponding to each enterprise name in the total enterprise name, and further obtaining a third corresponding relation;
and constructing the enterprise synonym library based on the first corresponding relation, the second corresponding relation and the third corresponding relation.
2. The method of claim 1, wherein extracting the full enterprise denominations using a TF-IDF abbreviation extraction algorithm comprises:
performing word segmentation processing on the full enterprise titles to obtain a plurality of word segments corresponding to each enterprise title;
counting the word frequency of each word in the multiple words in the full-scale enterprise full-scale;
acquiring an administrative division name, and extracting each enterprise full name in the full amount of enterprise full names for short based on the administrative division name and the word frequency to obtain an initial third corresponding relation;
and carrying out post-processing on the initial third corresponding relation to obtain the third corresponding relation.
3. The method of claim 2, wherein performing abbreviation extraction on each of the full-scale enterprise full-names based on the administrative division name and the word frequency comprises:
acquiring a current enterprise full scale in the full-scale enterprise full scales;
removing the administrative division names in the current enterprise full name according to the administrative division names to obtain enterprise names with the administrative division names removed;
performing word segmentation on the enterprise name without the administrative division name to obtain a plurality of first words corresponding to the enterprise name without the administrative division name;
determining the universality of each first participle based on the word frequency of each first participle in the plurality of first participles, and further removing the first participle with the universality of high-frequency words in the plurality of first participles to obtain the rest first participles;
if the universality of each of the remaining first participles is a low-frequency word, combining the remaining first participles to obtain an enterprise abbreviation corresponding to the current enterprise full name;
and if the universality of each of the remaining first participles is not the low-frequency word, extracting the enterprise abbreviation from the remaining first participles according to the characteristics of the enterprise abbreviation, and further obtaining the enterprise abbreviation corresponding to the current enterprise full name.
4. The method of claim 3, wherein performing corporal abbreviation extraction on the remaining first segmented words according to characteristics of the corporal abbreviation includes:
extracting a preset number of word segmentation with the lowest word frequency from the rest first word segmentation to form an initial enterprise abbreviation;
determining whether the word number length of the initial enterprise abbreviation is within a preset length range;
if the length is within the preset length range, judging whether a word-crossing condition exists among a preset number of word segments in the initial enterprise abbreviation;
and if the word-crossing condition does not exist, taking the initial enterprise abbreviation as the enterprise abbreviation corresponding to the current enterprise full name.
5. The method of claim 4, further comprising:
if the business name is not in the preset length range, returning to execute the step of performing word segmentation processing on the business name without the administrative division name;
and if the word number length of the initial enterprise abbreviation obtained after the step of performing word segmentation processing on the enterprise name without the administrative division name is returned is still not within the preset length range, taking the initial enterprise abbreviation as the enterprise abbreviation corresponding to the current enterprise full name.
6. The method of claim 4, further comprising:
if the word-crossing condition exists, determining whether the word-crossing condition of the initial enterprise abbreviation meets a preset condition;
if the preset conditions are met, taking the initial enterprise abbreviation as an enterprise abbreviation corresponding to the current enterprise full name;
if the preset condition is not met, returning to execute the step of performing word segmentation processing on the enterprise name without the administrative division name;
and if the word-crossing condition of the initial enterprise abbreviation does not meet the preset condition after the step of performing word segmentation processing on the enterprise name without the administrative division name is returned, taking the initial enterprise abbreviation as the enterprise abbreviation corresponding to the current enterprise full name.
7. The method of claim 2, wherein post-processing the initial third correspondence comprises:
determining that a third corresponding relation of the same enterprise abbreviation exists in the initial third corresponding relation;
and adding preset participles to the same enterprise abbreviations in the third corresponding relation with the same enterprise abbreviations to further obtain the third corresponding relation.
8. The method of claim 1, wherein constructing the enterprise thesaurus based on the first correspondence and/or the second correspondence and the third correspondence comprises:
fusing the first corresponding relation and/or the second corresponding relation and the third corresponding relation according to a preset priority order to obtain an initial enterprise synonym library;
if the condition that the enterprise full names correspond to the multiple repeated enterprise short names exists in the initial enterprise synonym library, determining a target enterprise short name as the enterprise short name corresponding to the enterprise full name in the multiple repeated enterprise short names according to the preset priority sequence, and further obtaining the enterprise synonym library;
and if the condition that the enterprise full names correspond to a plurality of repeated enterprise short names does not exist in the initial enterprise synonym library, taking the initial enterprise synonym library as the enterprise synonym library.
9. An apparatus for constructing an enterprise thesaurus, comprising:
the acquisition unit is used for acquiring a first corresponding relation between the enterprise full name and the enterprise short name which are manually input, and/or acquiring a second corresponding relation between the enterprise full name and the enterprise short name which are acquired by the data acquisition system according to a first preset period;
the abbreviation extraction unit is used for acquiring a total enterprise abbreviation according to a second preset period, extracting the total enterprise abbreviation by adopting a TF-IDF abbreviation extraction algorithm, and obtaining an enterprise abbreviation corresponding to each enterprise abbreviation in the total enterprise abbreviation so as to obtain a third corresponding relation;
and the construction unit is used for constructing the enterprise synonym library based on the first corresponding relation and/or the second corresponding relation and the third corresponding relation.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the method of any of the preceding claims 1 to 8 are implemented when the computer program is executed by the processor.
CN202011573431.9A 2020-12-25 2020-12-25 Method and device for constructing enterprise synonym library and electronic equipment Pending CN112613299A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011573431.9A CN112613299A (en) 2020-12-25 2020-12-25 Method and device for constructing enterprise synonym library and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011573431.9A CN112613299A (en) 2020-12-25 2020-12-25 Method and device for constructing enterprise synonym library and electronic equipment

Publications (1)

Publication Number Publication Date
CN112613299A true CN112613299A (en) 2021-04-06

Family

ID=75248107

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011573431.9A Pending CN112613299A (en) 2020-12-25 2020-12-25 Method and device for constructing enterprise synonym library and electronic equipment

Country Status (1)

Country Link
CN (1) CN112613299A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113468315A (en) * 2021-09-02 2021-10-01 北京华云安信息技术有限公司 Vulnerability vendor name matching method
CN115329039A (en) * 2022-08-08 2022-11-11 前锦网络信息技术(上海)有限公司 Recruitment enterprise searching method, system, electronic equipment and storage medium
CN116227472A (en) * 2023-03-06 2023-06-06 成都工业学院 Method for constructing accessory synonym library for BERT-FLAT entity recognition

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105975491A (en) * 2016-04-26 2016-09-28 重庆誉存企业信用管理有限公司 Enterprise news analysis method and system
CN106991085A (en) * 2017-04-01 2017-07-28 中国工商银行股份有限公司 The abbreviation generation method and device of a kind of entity
CN108460014A (en) * 2018-02-07 2018-08-28 百度在线网络技术(北京)有限公司 Recognition methods, device, computer equipment and the storage medium of business entity
CN109635285A (en) * 2018-11-26 2019-04-16 平安科技(深圳)有限公司 Enterprise's full name and abbreviation matching method, apparatus, computer equipment and storage medium
CN111814479A (en) * 2020-07-09 2020-10-23 上海明略人工智能(集团)有限公司 Enterprise short form generation and model training method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105975491A (en) * 2016-04-26 2016-09-28 重庆誉存企业信用管理有限公司 Enterprise news analysis method and system
CN106991085A (en) * 2017-04-01 2017-07-28 中国工商银行股份有限公司 The abbreviation generation method and device of a kind of entity
CN108460014A (en) * 2018-02-07 2018-08-28 百度在线网络技术(北京)有限公司 Recognition methods, device, computer equipment and the storage medium of business entity
CN109635285A (en) * 2018-11-26 2019-04-16 平安科技(深圳)有限公司 Enterprise's full name and abbreviation matching method, apparatus, computer equipment and storage medium
CN111814479A (en) * 2020-07-09 2020-10-23 上海明略人工智能(集团)有限公司 Enterprise short form generation and model training method and device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JIE ZHOU;BI-CHENG LI;GANG CHEN;: "基于中文维基的大规模命名实体识别语料自动生成方法(英文)", FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, no. 11, 3 November 2015 (2015-11-03), pages 940 - 957 *
孙丽萍;过弋;唐文武;徐永斌;: "基于构成模式和条件随机场的企业简称预测", 计算机应用, no. 02, 10 February 2016 (2016-02-10), pages 449 - 454 *
张晖;: "关于建立面向应用的规范词异名库的若干理论探讨", 中国科技术语, no. 04, 25 August 2013 (2013-08-25), pages 12 - 16 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113468315A (en) * 2021-09-02 2021-10-01 北京华云安信息技术有限公司 Vulnerability vendor name matching method
CN113468315B (en) * 2021-09-02 2021-12-10 北京华云安信息技术有限公司 Vulnerability vendor name matching method
CN115329039A (en) * 2022-08-08 2022-11-11 前锦网络信息技术(上海)有限公司 Recruitment enterprise searching method, system, electronic equipment and storage medium
CN115329039B (en) * 2022-08-08 2023-08-04 前锦网络信息技术(上海)有限公司 Recruitment enterprise searching method and system, electronic equipment and storage medium
CN116227472A (en) * 2023-03-06 2023-06-06 成都工业学院 Method for constructing accessory synonym library for BERT-FLAT entity recognition
CN116227472B (en) * 2023-03-06 2024-05-07 成都工业学院 Method for constructing accessory synonym library for BERT-FLAT entity recognition

Similar Documents

Publication Publication Date Title
CN112613299A (en) Method and device for constructing enterprise synonym library and electronic equipment
US9767127B2 (en) Method for record linkage from multiple sources
Martínez‐Zarzoso et al. Exports and governance: Is the Middle East and North Africa region different?
CN107870927B (en) File evaluation method and device
CN110377558B (en) Document query method, device, computer equipment and storage medium
CN108512883B (en) Information pushing method and device and readable medium
CN106874335B (en) Behavior data processing method and device and server
CN108763961B (en) Big data based privacy data grading method and device
CN105512300B (en) information filtering method and system
KR102076657B1 (en) a method for analyzing a company risk for monitoring a crisis of a company and a device therefor
CN108073678B (en) Document analysis processing method, system and device applied to big data analysis
CN109902129B (en) Insurance agent classifying method and related equipment based on big data analysis
CN105893397A (en) Video recommendation method and apparatus
JP5669904B1 (en) Document search system, document search method, and document search program for providing prior information
CN111881170B (en) Method, device, equipment and storage medium for mining timeliness query content field
CN112100670A (en) Big data based privacy data grading protection method
CN115048483A (en) Information management system
CN110909112B (en) Data extraction method, device, terminal equipment and medium
CN113868373A (en) Word cloud generation method and device, electronic equipment and storage medium
KR100943625B1 (en) Method and System for Generating Integrated Database for Integradedly Managing Local Information and Website Information and Method for Providing Search Result Using Integrated Database
CN113962302A (en) Sensitive data intelligent identification method based on label distribution learning
US6968339B1 (en) System and method for selecting data to be corrected
CN113204696A (en) Retrieval method of intelligent search engine based on text atlas
CN111191049A (en) Information pushing method and device, computer equipment and storage medium
CN111191126A (en) Keyword-based scientific and technological achievement accurate pushing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination