CN114020903A - Industry classification method, device, equipment and medium for industrial enterprises - Google Patents
Industry classification method, device, equipment and medium for industrial enterprises Download PDFInfo
- Publication number
- CN114020903A CN114020903A CN202111175853.5A CN202111175853A CN114020903A CN 114020903 A CN114020903 A CN 114020903A CN 202111175853 A CN202111175853 A CN 202111175853A CN 114020903 A CN114020903 A CN 114020903A
- Authority
- CN
- China
- Prior art keywords
- classification
- word set
- industry
- matching
- brand
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 51
- 238000005516 engineering process Methods 0.000 claims abstract description 35
- 238000003860 storage Methods 0.000 claims description 12
- 238000004590 computer program Methods 0.000 claims description 9
- 238000000605 extraction Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 abstract description 3
- 239000000463 material Substances 0.000 description 7
- 230000008569 process Effects 0.000 description 6
- 230000008859 change Effects 0.000 description 4
- 238000004891 communication Methods 0.000 description 4
- 239000003814 drug Substances 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 238000009826 distribution Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 238000012216 screening Methods 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 239000004753 textile Substances 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- BQCADISMDOOEFD-UHFFFAOYSA-N Silver Chemical compound [Ag] BQCADISMDOOEFD-UHFFFAOYSA-N 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 239000000446 fuel Substances 0.000 description 1
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 1
- 239000010931 gold Substances 0.000 description 1
- 229910052737 gold Inorganic materials 0.000 description 1
- 238000009776 industrial production Methods 0.000 description 1
- 239000002184 metal Substances 0.000 description 1
- 229910052751 metal Inorganic materials 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 229910052709 silver Inorganic materials 0.000 description 1
- 239000004332 silver Substances 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application provides an industry classification method, an industry classification device, industry classification equipment and industry classification media for industrial enterprises, which relate to the technical field of data processing and comprise the following steps: extracting a first keyword of the name of the industrial enterprise to be classified; matching the first keywords in a brand library to obtain a brand classification word set; matching the first keywords in a science and technology term library to obtain a technology classification word set; and matching the technical classification word set and the brand classification word set with an industry classification word set in sequence, obtaining a matching result according to a preset matching rule, and taking the matching result as an industry classification result of the industrial enterprise to be classified. The method and the device can realize industry classification of each enterprise according to different industry classification standards, and are high in classification accuracy and high in speed.
Description
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a method, an apparatus, a device, and a medium for industry classification of an industrial enterprise.
Background
In government agencies interfacing with enterprises or other similar enterprise service centers, research agencies, there is often a need to perform industry partitioning of local enterprises to analyze local industry/industry conditions. Due to different policy document indications or research requirements, the corresponding industry classifications are different, for example: some need to refer to national economic industry classification, some need to refer to government industrial policy, and some need to refer to local industry division. Thus, for the same enterprise in a certain region, various classification standards exist due to different policy documents and different purpose requirements.
If no proper industry classification method is available, enterprise classification in a certain area is analyzed, due to the fact that reference standards are different, personnel are required to understand various target industry classification standards again each time, then thousands of enterprises are manually matched with the industry classification standards one by one to make labels, main operation business of a company needs to be inquired at any time in the labeling process, and a large amount of labor and material cost is consumed.
Disclosure of Invention
In view of this, the present application provides an industry classification method, apparatus, device and medium for industrial enterprises to solve the technical problem of lack of a rapid industry classification method for different industry classification standards.
In one aspect, an embodiment of the present application provides an industry classification method for an industrial enterprise, including:
extracting a first keyword of the name of the industrial enterprise to be classified;
matching the first keywords in a brand library to obtain a brand classification word set;
matching the first keywords in a science and technology term library to obtain a technology classification word set;
and matching the technical classification word set and the brand classification word set with an industry classification word set in sequence, obtaining a matching result according to a preset matching rule, and taking the matching result as an industry classification result of the industrial enterprise to be classified.
Further, the extracting the first keyword of the name of the industrial enterprise to be classified includes:
matching the names of the industrial enterprises in a place name table to obtain place names in the names of the industrial enterprises;
matching the names of the industrial enterprises in a company property table to obtain company property names in the names of the industrial enterprises;
obtaining punctuation marks of names of industrial enterprises;
and deleting the place name, the company property name and the punctuation mark from the name of the industrial enterprise, and taking the rest fields as first keywords.
Further, one data item of the brand library includes at least four associated fields: enterprise names, brand names, application time and international classification, wherein the first keywords are matched in a brand library to obtain a brand classification word set; the method comprises the following steps:
performing brand matching on the first keyword in a brand library, acquiring all matched data items if the brand matching is successful, matching the names of the industrial enterprises to be classified in an enterprise name column of the data items, and inputting the international classification corresponding to the names of the industrial enterprises into a brand classification word set if the matching is successful, or inputting the international classification applying for the brand in the data items at the earliest time into the brand classification word set if the matching is not successful; and if the brand matching is unsuccessful, matching the names of the industrial enterprises to be classified in the brand library, and if the names are successfully matched, inputting the international classification contained in the matched data item into a brand classification word set.
Further, matching the first keyword in a science and technology term library to obtain a technology classification word set; the method comprises the following steps:
and extracting scientific and technological terms from the first keywords through a scientific and technological term library to serve as second keywords, segmenting the second keywords, and forming a technical classification word set by all the segmented words.
Further, inputting industry classification word sets into industries contained in industry classification standards, matching the technology classification word sets and the brand classification word sets with the industry classification word sets in sequence, obtaining matching results according to preset matching rules, and taking the matching results as industry classification results of the industrial enterprises to be classified; the method comprises the following steps:
matching the technical classification word set in an industry classification word set, and if the matching is successful, taking the industry matched with the technical classification word set as an industry classification result;
otherwise, matching the brand classified word set in an industry classified word set, and if the matching is successful, taking the industry matched with the brand classified word set as an industry classification result; otherwise, acquiring internationally classified subclasses in the brand classified word set, inputting the internationally classified subclasses into a second brand classified word set, matching the second brand classified word set in the industry classified word set, and taking the industry matched with the second brand classified word set as an industry classification result if the matching is successful.
Further, the successful matching means that the industry classified word set contains any word in the technology classified word set, or the industry classified word set contains any word in the brand classified word set, or the industry classified word set contains any word in the second brand classified word set.
Further, when the industry classification result of the industrial enterprise to be classified cannot be obtained, the method further comprises the following steps:
segmenting words of each industry of the industry classified word set, extracting technical language keywords through dictionary explanation, and inputting all technical term keywords into a second industry classified word set;
matching the technical classification word set in a second industry classification word set, and if the matching is successful, taking the industry corresponding to the scientific and technological term keywords matched with the technical classification word set as an industry classification result;
otherwise, matching the brand classified word set in a second industry classified word set, and if the matching is successful, taking the industry corresponding to the scientific and technological term keywords matched with the brand classified word set as an industry classification result; and if the matching succeeds, taking the industry corresponding to the scientific term keywords matched with the second brand classified word set as an industry classification result.
On the other hand, the embodiment of the present application provides an industry classification device of an industrial enterprise, including:
the extraction unit is used for extracting a first keyword of the name of the industrial enterprise to be classified;
the brand classification word set generating unit is used for matching the first key words in a brand library to obtain a brand classification word set;
the technical classification word set generating unit is used for matching the first keyword in a technical term library to obtain a technical classification word set;
and the industry classification unit is used for matching the technical classification word set and the brand classification word set with the industry classification word set in sequence, obtaining a matching result according to a preset matching rule, and taking the matching result as an industry classification result of the industrial enterprise to be classified.
In another aspect, an embodiment of the present application provides an electronic device, including: the system comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the computer program to realize the industry classification method of the industrial enterprise.
In another aspect, an embodiment of the present application provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements the industry classification method for an industrial enterprise according to the embodiment of the present application.
Compared with the prior art, the technical advantages of the application are that:
1. the method and the device can realize industry classification for each enterprise according to different industry classification standards, and have high classification precision and high speed;
2. the classification method is simple and efficient, and can be flexibly applied to various standard industry classification scenes, so that a large amount of labor cost is saved.
3. The classification device is convenient to operate, and can realize rapid and accurate classification only by leading in the names of the batch enterprises to be classified and the required specific industry classification standard.
Drawings
In order to more clearly illustrate the detailed description of the present application or the technical solutions in the prior art, the drawings needed to be used in the detailed description of the present application or the prior art description will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings can be obtained according to these drawings by those skilled in the art without creative efforts.
Fig. 1 is a flowchart of an industry classification method for an industrial enterprise according to an embodiment of the present application;
FIG. 2 is a flowchart of generating a brand category vocabulary Cbranches according to an embodiment of the present application;
fig. 3 is a flowchart for sequentially matching the technology classified word set and the brand classified word set with an industry classified word set according to the embodiment of the present application;
fig. 4 is a functional structure schematic diagram of an industry classification device of an industrial enterprise according to an embodiment of the present application;
fig. 5 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some but not all embodiments of the present application. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present application, as presented in the figures, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given in the present application without any inventive step, shall fall within the scope of protection of the present application.
First, the design idea of the embodiment of the present application is briefly introduced.
When the industry classification is carried out on local enterprises, corresponding industry classification standards are different due to different policy file indications or research requirements. For example: some need to refer to national economic industry classification, and some need to refer to government industrial policy, such as 10 marked key fields: 1. a new generation of information technology industry, integrated circuits and special equipment information communication equipment operating systems and industrial software; 2, a high-grade numerical control machine tool and a robot; 3. aeronautical equipment (large airplanes, drones, helicopters, etc.) and aerospace equipment (new generation launch vehicles, heavy vehicles); 4. marine engineering equipment and high-tech ships; 5, advanced rail transit equipment; 6. energy-saving and new energy vehicles (electric vehicles, fuel cell vehicles); electric equipment; 8. agricultural machinery equipment; 9. a new material; 10. biological medicine and high performance medical equipment. Still further needs refer to local industry divisions, such as the Jiangsu province that proposed 13 advanced manufacturing cluster classifications: new generation displays, biological medicine and novel medical devices, high-end textiles, new energy and intelligent power grids, automobiles and parts, nanotechnology applications, optical communications, software and digital economy, robots and key parts, high-end equipment, integrated circuits, internet of things, and advanced materials. There are 6 major industries in Suzhou: the industrial production process comprises six major industrial fields, namely a new-generation electronic information industry, a high-end equipment manufacturing industry, a new material industry, a software and integrated circuit industry, a new energy and energy-saving environment-friendly industry, a medical apparatus and instrument industry, a biological medicine industry and the like.
Therefore, for a same enterprise in a certain region, there are various industry classification standards due to different policy documents and different purpose requirements. If no proper industry classification method is available, enterprise classification in a certain area is analyzed, due to the fact that reference standards are different, personnel need to understand various target industry classification standards again each time, then thousands of enterprises are manually matched with the industry classification standards one by one to make labels, main operation business of a company needs to be inquired at any time in the labeling process, and a large amount of labor and material cost is consumed.
In order to solve the above problem, an embodiment of the present application provides an industry classification method for an industrial enterprise: extracting a first keyword of the name of the industrial enterprise to be classified; matching the first keywords in a brand library to obtain a brand classification word set; matching the first keywords in a science and technology term library to obtain a technology classification word set; and matching the technical classification word set and the brand classification word set with an industry classification word set in sequence, obtaining a matching result according to a preset matching rule, and taking the matching result as an industry classification result of the industrial enterprise to be classified.
The industry classification word set is determined according to specific industry classification standards and can change along with the change of the specific industry classification standards selected by a user; therefore, the industry classification of each enterprise can be realized according to different industry classification standards, and the classification precision is high and the speed is high. The databases used for query in the embodiment of the application are all existing databases which are easy to obtain, so that a new database does not need to be developed, and the method is low in workload and easy to implement.
After introducing the application scenario and the design concept of the embodiment of the present application, the following describes a technical solution provided by the embodiment of the present application.
The first embodiment is as follows:
as shown in fig. 1, an embodiment of the present application provides an industry classification method for an industrial enterprise, including:
step 101: extracting a first keyword of the name of the industrial enterprise to be classified;
in step 101, matching place names, company property names and punctuation marks in enterprise names according to a place name table and a company property table, and deleting the place names, the company property names and the punctuation marks to obtain the rest as first-layer keywords;
wherein the geographical name list comprises Beijing, Shanghai, Suzhou, Chengdu and the like; the company property sheet includes limited companies, partnerships, companies, groups, technologies, etc. Punctuation marks include brackets "()".
For example, the business name: a weftcreate capital (kunshan) limited company removes "kunshan", "limited company", "(", ")", thereby obtaining a first keyword: 'Laozhou create capital expenditure'.
For example, the business name: the normal-maturing Frameo automobile wiper system company Limited removes 'normal-maturing' and 'Limited company', and thus obtains a first keyword: a Fraleio automobile wiper system.
Step 102: matching the first keywords in a brand library to obtain a brand classification word set Cbrands;
the brand database may be a global brand database referring to an existing brand database, such as a world intellectual property organization, or a national brand database, such as a chinese country brand database, or a general commercial brand database, and one data item includes at least four associated fields: the name of a company, the name of a brand, the application time and the international classification, and subclasses are set under the international classification. Because the brand is often the brand of a certain industry, great support can be provided for industry classification.
As shown in fig. 2, the specific implementation process of this step includes:
matching the first keyword in a brand library, if the brand matching is successful, matching enterprise full names in a brand corresponding list (the rightmost column in the table 1), if the enterprise full names are successfully matched, determining that the corresponding list is classified as a brand classification word set Cbrands, and if the enterprise full names cannot be successfully matched, recording several international classifications of the earliest application brand as the brand classification word set Cbrands;
otherwise, the enterprise full name is directly matched with the brand library, if the matching is successful, the corresponding international classification is recorded as a brand classification word set Cbrands, and if the matching is unsuccessful, the Cbrands are empty sets.
The brand library may be a reference to an existing brand database, such as a global brand database of the world intellectual property organization, or such as a chinese national brand library, or a general commercial brand database, etc., typically at least four associated fields: the names of companies, the names of brands, the application time and the international classification, and subclasses are set under the international classification. Because the brand is often the brand of a certain industry, great support can be provided for industry classification.
In brand matching, generally, a brand corresponds to a plurality of enterprises and categories, and is interfered by fuzzy search of a database, so that a plurality of search results can be obtained, and the enterprise name needs to be verified. As shown in the following table, "atlas" in "atlas sunshine electric power group, ltd" is searched, and as a result, many irrelevant enterprises are searched, and it is necessary to perform screening by using enterprise names, as shown in table 1:
TABLE 1
18 | 31495116 | 21 | 2018, 06 months and 08 days | Atlas base | Wang Yi |
19 | 31400595 | 2 | 06 and 05 months in 2018 | Art s, silver atlas | Liaoyuan Jinshi advertisement Limited liability company |
20 | 31397512 | 2 | 06 and 05 months in 2018 | Art s, atts, gold king | Liaoyuan Jinshi advertisement Limited liability company |
21 | 30703817 | 6 | 04 th month 05 2018 | Atlas | Ates Sunshine Power Group Co., Ltd. |
22 | 30697184 | 19 | 04 th month 05 2018 | Atlas | Ates Sunshine Power Group Co., Ltd. |
23 | 30693607 | 9 | 04 th month 05 2018 | Atlas | Ates Sunshine Power Group Co., Ltd. |
24 | 28613747 | 7 | Year 2018, month 01, and day 10 | Atlas of Atlanten Arisan | Jiangyin atlas electromechanical devices Limited |
25 | 28611461 | 9 | 2018 years oldDay 01, month 10 | Atlas of Atlanten Arisan | Jiangyin atlas electromechanical devices Limited |
26 | 27146259 | 33 | 35 month and 27 days 2017 | Atlas | Shenzhen City Zitai actual Co Ltd |
Some brands can not find corresponding enterprises in the brand database, for example, according to 'Farao', the 'constant Farao automobile wiper system Co., Ltd' can not be searched, and the brand is applied for the attribute at the earliest, for example, the time of the earliest application of 'Farao' is 9/13 th 1994. As shown in table 2:
TABLE 2
67 | 923131 | 11 | 09/13 th 1994 | Faleiao | Faleiao |
68 | 906387 | 9 | 09/13 th 1994 | Faleiao | Faleiao |
69 | 902723 | 12 | 09/13 th 1994 | Faleiao | Faleiao |
70 | 874279 | 7 | 09/13 th 1994 | Faleiao | Faleiao |
71 | 854297 | 6 | 09/13 th 1994 | Faleiao | Faleiao |
For enterprise-wide name matching, there is a case where there is an enterprise name but there is no corresponding brand, for example, Qingdao hail robot limited does not apply for the "hail" brand, so that the enterprise is searched by the "hail" (this is often caused by the fact that this enterprise is a sub-company under the brand main company), and at this time, the enterprise name can be used, and the search result is shown in table 3:
TABLE 3
The international classification corresponding to the business name can be directly used as the brand classification (strictly speaking, the brand classification of the subsidiary company under the company of the brand).
Through the matching and screening of the above processes, there may be a plurality of international classifications as brand industry classifications, which are denoted as cbrans, so that it is a word set.
Step 103: matching the first keyword in a science and technology term library to obtain a technology classification word set CTechs;
extracting scientific and technological terms from the first keywords through a scientific and technological term library to serve as second keywords, segmenting the second keywords, and forming a technical classification word set CTechs by all the segmented words; or
For example, in the above example, "faleo car wiper system", the brand keyBrand "faleo" can be extracted through step 102; after the matching and screening of the science and technology term library in the automobile wiper system, the automobile is matched firstly, and then the matching is carried out, all terms form a technology classification word set Ctech [ automobile, wiper, system ]. This is possible using a general technical language database.
Step 104: matching the technical classified word set and the brand classified word set with an industry classified word set in sequence; if a matching result is obtained, taking the matching result as an industry classification result of the industrial enterprise to be classified, otherwise, entering step 105;
the industry classification word set IndCatalog is determined according to a specific industry classification standard and can change along with the change of the specific industry classification standard selected by a user. For example, the industry taxonomy set is: new generation displays, biological medicine and novel medical devices, high-end textiles, new energy and smart grids, automobiles and parts, nanotechnology applications, optical communications, software and digital economy, robots and key parts, high-end equipment, integrated circuits, internet of things, and leading-edge new materials.
As shown in fig. 3, the specific implementation process of this step includes:
matching a technology classification word set Ctech with an industry classification word set, and if the matching is successful, taking the industry matched with the technology classification word set Ctech as an industry classification result;
otherwise, matching the brand classification word set Cbrands with the industry classification word set, and if the matching is successful, taking the industry matched with the brand classification word set Cbrands as an industry classification result; otherwise, acquiring the subclasses of international classification in the brand classification word set, inputting the subclasses of international classification into a second brand classification word set, matching the second brand classification word set in the industry classification word set, and taking the industry matched with the second brand classification word set as an industry classification result if the matching is successful; otherwise, the industry classification result is not obtained, and the step 105 is entered.
The matching in this step is not a highly accurate matching, but refers to an inclusion relationship, and correspondingly, refers to whether the industry classified word set IndCatalog contains any word in the technology classified word set, or the industry classified word set contains any word in the brand classified word set, or the industry classified word set contains any word in the second brand classified word set. Further, one enterprise may belong to multiple categories.
For example, the CTechs of "normal fareo car wiper system limited" includes cars, wipers, systems, and the cbrads includes metal-containing materials, mechanical equipment, scientific equipment, lamp air conditioners, transportation tools, and the matching can result in the association between the "cars" in the CTechs and the "cars and parts" in the industry classification of IndCatalog to be matched, while the cbrads does not match, the company is classified as "cars and parts".
For example, the subclass 0933, 0913, etc. of "astter" in the brand database may extract relevant words such as solar energy, charger, distribution box, etc.
Step 105: segmenting words of each industry of the industry classified word set, extracting scientific and technological term keywords through dictionary interpretation, and inputting all the scientific and technological term keywords into a second industry classified word set;
for example, in the industry classification standard "new energy and smart grid", the "new energy" may be extracted from the dictionary by the science and technology term library to obtain "solar energy", "terrestrial heat", "nuclear energy", and the like, and the "grid" includes "power transformation", "power transmission", "power distribution", and the like, so the science and technology term keyword corresponding to the "new energy and smart grid" is: solar energy, geothermal energy, nuclear energy, power transformation, power transmission and power distribution.
Step 106: matching the technical classification word set and the brand classification word set with a second industry classification word set in sequence, obtaining a matching result according to a preset matching rule, and taking the matching result as an industry classification result of the industrial enterprise to be classified;
specifically, the specific implementation process of the step includes:
matching the technical classification word set in a second industry classification word set, and if the matching is successful, taking the industry corresponding to the scientific and technological term keywords matched with the technical classification word set as an industry classification result;
otherwise, matching the brand classified word set in a second industry classified word set, and if the matching is successful, taking the industry corresponding to the scientific and technological term keywords matched with the brand classified word set as an industry classification result; and if the matching succeeds, taking the industry corresponding to the scientific term keywords matched with the second brand classified word set as an industry classification result.
And if the industry classification word set of a company does not match the proper industry, classifying the industry classification of the company as pending and processing the pending industry by an operator.
In the above steps, the word segmentation may use a forward maximum matching method, a reverse maximum matching method, or a bidirectional matching method. The matching algorithm may be a naive matching algorithm, or may also be a KMP pattern matching algorithm, etc., which are not limited herein.
Example two:
based on the above embodiments, an industry classification device of an industrial enterprise is provided in the embodiments of the present application, and referring to fig. 4, an industry classification device 200 of an industrial enterprise provided in the embodiments of the present application at least includes:
an extracting unit 201, configured to extract a first keyword of a name of an industrial enterprise to be classified;
a brand classification word set generating unit 202, configured to match the first keyword in a brand library to obtain a brand classification word set;
a technology classification word set generating unit 203, configured to match the first keyword in a technology term library to obtain a technology classification word set;
a first industry classification unit 204, configured to match the technology classification word set and the brand classification word set with a first industry classification word set in sequence, obtain a matching result according to a preset matching rule, and use the matching result as an industry classification result of the industrial enterprise to be classified;
in a possible implementation, the extraction unit 201 is specifically configured to:
matching the names of the industrial enterprises in a place name table to obtain place names in the names of the industrial enterprises;
matching the names of the industrial enterprises in a company property table to obtain company property names in the names of the industrial enterprises;
obtaining punctuation marks of names of industrial enterprises;
and deleting the place name, the company property name and the punctuation mark from the name of the industrial enterprise, and taking the rest fields as first keywords.
In one possible embodiment, one data item of the brand library comprises at least four associated fields: the enterprise name, brand name, application time, and international classification, and the brand classification vocabulary generating unit 202 is specifically configured to:
performing brand matching on the first keyword in a brand library, acquiring all matched data items if the brand matching is successful, matching the names of the industrial enterprises to be classified in an enterprise name column of the data items, and inputting the international classification corresponding to the names of the industrial enterprises into a brand classification word set if the matching is successful, or inputting the international classification applying for the brand in the data items at the earliest time into the brand classification word set if the matching is not successful; and if the brand matching is unsuccessful, matching the names of the industrial enterprises to be classified in the brand library, and if the names are successfully matched, inputting the international classification contained in the matched data item into a brand classification word set.
In a possible implementation manner, the technology category word set generating unit 203 is specifically configured to:
and extracting scientific and technological terms from the first keywords through a scientific and technological term library to serve as second keywords, segmenting the second keywords, and forming a technical classification word set by all the segmented words.
In a possible implementation manner, each industry included in the industry classification standard is entered into a first industry classification word set, and the first industry classification unit 204 is specifically configured to:
matching the technical classification word set in a first industry classification word set, and if the matching is successful, taking the industry matched with the technical classification word set as an industry classification result;
otherwise, matching the brand classified word set in the first industry classified word set, and if the matching is successful, taking the industry matched with the brand classified word set as an industry classification result; otherwise, acquiring the subclass of the international classification in the brand classification word set, inputting the subclass of the international classification into a second brand classification word set, matching the second brand classification word set in the industry classification word set, and taking the industry matched with the second brand classification word set as an industry classification result if the matching is successful.
Correspondingly, the successful matching means that the first industry classified word set contains any word in the technology classified word set, or the first industry classified word set contains any word in the brand classified word set, or the first industry classified word set contains any word in the second brand classified word set.
When the industry classification result of the industrial enterprise to be classified cannot be obtained, the device further comprises: a second industry classification word set generation unit 205 and a second industry classification unit 206;
the second industry classified word set generating unit is used for segmenting words of each industry of the industry classified word set, extracting scientific and technological term keywords through dictionary interpretation, and inputting all the scientific and technological term keywords into the second industry classified word set;
and the second industry classification unit is used for matching the technology classification word set and the brand classification word set with a second industry classification word set in sequence, obtaining a matching result according to a preset matching rule, and taking the matching result as an industry classification result of the industrial enterprise to be classified.
In a possible implementation, the second industry classification unit is specifically configured to:
matching the technical classification word set in a second industry classification word set, and if the matching is successful, taking the industry corresponding to the scientific and technological term keywords matched with the technical classification word set as an industry classification result;
otherwise, matching the brand classified word set in a second industry classified word set, and if the matching is successful, taking the industry corresponding to the scientific and technological term keywords matched with the brand classified word set as an industry classification result; and if the matching succeeds, taking the industry corresponding to the scientific term keywords matched with the second brand classified word set as an industry classification result.
It should be noted that, because the principle of the industrial classification device 200 for industrial enterprises provided in the embodiment of the present application to solve the technical problem is similar to the industrial classification method for industrial enterprises provided in the embodiment of the present application, the implementation of the industrial classification device 200 for industrial enterprises provided in the embodiment of the present application can refer to the implementation of the industrial classification method for industrial enterprises provided in the embodiment of the present application, and repeated parts are not repeated.
Example three:
based on the foregoing embodiments, an embodiment of the present application further provides an electronic device, and as shown in fig. 5, an electronic device 300 provided in an embodiment of the present application at least includes: the industrial classification method for the industrial enterprise provided by the embodiment of the application is realized when the processor 301 executes the computer program stored on the memory 302 and can be executed on the processor 301.
The electronic device 300 provided by the embodiment of the present application may further include a bus 303 connecting different components (including the processor 301 and the memory 302). Bus 303 represents one or more of any of several types of bus structures, including a memory bus, a peripheral bus, a local bus, and so forth.
The Memory 302 may include readable media in the form of volatile Memory, such as Random Access Memory (RAM) 3021 and/or cache Memory 3022, and may further include Read Only Memory (ROM) 3023.
The memory 302 may also include a program tool 3024 having a set (at least one) of program modules 3025, the program modules 3025 including, but not limited to: an operating subsystem, one or more application programs, other program modules, and program data, each of which, and in some combination, may comprise an implementation of a network environment.
Electronic device 300 may also communicate with one or more external devices 304 (e.g., keyboard, remote control, etc.), with one or more devices that enable a user to interact with electronic device 300 (e.g., cell phone, computer, etc.), and/or with any device that enables electronic device 300 to communicate with one or more other electronic devices 300 (e.g., router, modem, etc.). Such communication may be through an Input/Output (I/O) interface 305. Also, the electronic device 300 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public Network, such as the internet) via the Network adapter 306. As shown in FIG. 5, the network adapter 306 communicates with the other modules of the electronic device 300 via the bus 303. It should be understood that although not shown in FIG. 4, other hardware and/or software modules may be used in conjunction with electronic device 300, including but not limited to: microcode, device drives, Redundant processors, external disk drive Arrays, disk array (RAID) subsystems, tape drives, and data backup storage subsystems, to name a few.
It should be noted that the electronic device 300 shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of the application of the embodiments.
Example four:
the embodiment of the application also provides a computer-readable storage medium, which stores computer instructions, and the computer instructions, when executed by a processor, implement the industry classification method for the industrial enterprise provided by the embodiment of the application.
Example five:
the industrial classification method for an industrial enterprise provided by the embodiment of the present application can also be implemented as a program product, which includes program code for causing the electronic device 300 to execute the industrial classification method for an industrial enterprise provided by the embodiment of the present application when the program product is run on the electronic device 300.
The program product provided by the embodiments of the present application may be any combination of one or more readable media, wherein the readable media may be a readable signal medium or a readable storage medium, and the readable storage medium may be but is not limited to an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof, and in particular, more specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a RAM, a ROM, an Erasable Programmable Read-Only Memory (EPROM), an optical fiber, a portable Compact disk Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The program product provided by the embodiment of the application can adopt a CD-ROM and comprises program codes, and can run on a computing device. However, the program product provided by the embodiments of the present application is not limited thereto, and in the embodiments of the present application, the readable storage medium may be any tangible medium that can contain or store a program, which can be used by or in connection with an instruction execution system, apparatus, or device.
It should be noted that although several units or sub-units of the apparatus are mentioned in the above detailed description, such division is merely exemplary and not mandatory. Indeed, the features and functions of two or more units described above may be embodied in one unit, according to embodiments of the present application. Conversely, the features and functions of one unit described above may be further divided into embodiments by a plurality of units.
Further, while the operations of the methods of the present application are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.
Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.
Claims (10)
1. An industry classification method for an industrial enterprise, comprising:
extracting a first keyword of the name of the industrial enterprise to be classified;
matching the first keywords in a brand library to obtain a brand classification word set;
matching the first keywords in a science and technology term library to obtain a technology classification word set;
and matching the technical classification word set and the brand classification word set with an industry classification word set in sequence, obtaining a matching result according to a preset matching rule, and taking the matching result as an industry classification result of the industrial enterprise to be classified.
2. The industry classification method of industrial enterprises according to claim 1, wherein the extracting the first keyword of the name of the industrial enterprise to be classified comprises:
matching the names of the industrial enterprises in a place name table to obtain place names in the names of the industrial enterprises;
matching the names of the industrial enterprises in a company property table to obtain company property names in the names of the industrial enterprises;
obtaining punctuation marks of names of industrial enterprises;
and deleting the place name, the company property name and the punctuation mark from the name of the industrial enterprise, and taking the rest fields as first keywords.
3. The industry classification method for industrial enterprises according to claim 1, wherein one data item of the brand library comprises at least four associated fields: enterprise names, brand names, application time and international classification, wherein the first keywords are matched in a brand library to obtain a brand classification word set; the method comprises the following steps:
performing brand matching on the first keyword in a brand library, if the brand matching is successful, acquiring all matched data items, matching the names of industrial enterprises to be classified in an enterprise name column of the data items, if the matching is successful, inputting the international classification corresponding to the names of the industrial enterprises into a brand classification word set, and otherwise, inputting the international classification which applies for the brand at the earliest in the data items into the brand classification word set; and if the brand matching is unsuccessful, matching the names of the industrial enterprises to be classified in the brand library, and if the name matching is successful, inputting the international classification contained in the matched data item into a brand classification word set.
4. The industry classification method of industrial enterprises according to claim 1, wherein the first keyword is matched in a science and technology term base to obtain a technology classification word set; the method comprises the following steps:
and extracting scientific and technological terms from the first keywords through a scientific and technological term library to serve as second keywords, segmenting the second keywords, and forming a technical classification word set by all the segmented words.
5. The industry classification method of industrial enterprises according to any one of claims 1 to 4, wherein each industry included in industry classification standards is entered into an industry classification word set, the technology classification word set and the brand classification word set are sequentially matched with the industry classification word set, a matching result is obtained according to a preset matching rule, and the matching result is used as the industry classification result of the industrial enterprise to be classified; the method comprises the following steps:
matching the technical classification word set in an industry classification word set, and if the matching is successful, taking the industry matched with the technical classification word set as an industry classification result;
otherwise, matching the brand classified word set in an industry classified word set, and if the matching is successful, taking the industry matched with the brand classified word set as an industry classification result; otherwise, acquiring the subclass of the international classification in the brand classification word set, inputting the subclass of the international classification into a second brand classification word set, matching the second brand classification word set in the industry classification word set, and taking the industry matched with the second brand classification word set as an industry classification result if the matching is successful.
6. The industry classification method of industrial enterprises according to claim 5, wherein the successful matching means that the industry classified word set contains any word in the technical classified word set, or the industry classified word set contains any word in the brand classified word set, or the industry classified word set contains any word in the second brand classified word set.
7. The industry classification method of industrial enterprises according to claim 5, wherein when the industry classification result of the industrial enterprise to be classified cannot be obtained, the method further comprises:
segmenting words of each industry of the industry classified word set, extracting scientific and technological term keywords through dictionary interpretation, and inputting all the scientific and technological term keywords into a second industry classified word set;
matching the technical classification word set in a second industry classification word set, and if the matching is successful, taking the industry corresponding to the scientific and technological term keywords matched with the technical classification word set as an industry classification result;
otherwise, matching the brand classified word set in a second industry classified word set, and if the matching is successful, taking the industry corresponding to the scientific and technological term keywords matched with the brand classified word set as an industry classification result; otherwise, acquiring internationally classified subclasses in the brand classified word set, inputting the internationally classified subclasses into a second brand classified word set, matching the second brand classified word set in a second industry classified word set, and if the matching is successful, taking industries corresponding to the scientific and technological term keywords matched with the second brand classified word set as industry classification results.
8. An industry classification apparatus of an industrial enterprise, comprising:
the extraction unit is used for extracting a first keyword of the name of the industrial enterprise to be classified;
the brand classification word set generating unit is used for matching the first key words in a brand library to obtain a brand classification word set;
the technical classification word set generating unit is used for matching the first keyword in a technical term library to obtain a technical classification word set;
and the industry classification unit is used for matching the technical classification word set and the brand classification word set with an industry classification word set in sequence, obtaining a matching result according to a preset matching rule, and taking the matching result as an industry classification result of the industrial enterprise to be classified.
9. An electronic device, comprising: a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the industry classification method of an industrial enterprise as claimed in any one of claims 1-7 when executing the computer program.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, implements the industrial classification method for an industrial enterprise according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111175853.5A CN114020903A (en) | 2021-10-09 | 2021-10-09 | Industry classification method, device, equipment and medium for industrial enterprises |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111175853.5A CN114020903A (en) | 2021-10-09 | 2021-10-09 | Industry classification method, device, equipment and medium for industrial enterprises |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114020903A true CN114020903A (en) | 2022-02-08 |
Family
ID=80055639
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111175853.5A Pending CN114020903A (en) | 2021-10-09 | 2021-10-09 | Industry classification method, device, equipment and medium for industrial enterprises |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114020903A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115018258A (en) * | 2022-05-11 | 2022-09-06 | 中国城市规划设计研究院深圳分院 | Method for identifying enterprise type and industrial chain space in target area |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110569421A (en) * | 2019-08-22 | 2019-12-13 | 上海摩库数据技术有限公司 | search method based on chemical industry |
CN112115348A (en) * | 2020-08-05 | 2020-12-22 | 互联网域名系统北京市工程研究中心有限公司 | Method and system for recommending brand domain name registration |
CN113342984A (en) * | 2021-07-05 | 2021-09-03 | 深圳云谷星辰信息技术有限公司 | Garden enterprise classification method and system, intelligent terminal and storage medium |
-
2021
- 2021-10-09 CN CN202111175853.5A patent/CN114020903A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110569421A (en) * | 2019-08-22 | 2019-12-13 | 上海摩库数据技术有限公司 | search method based on chemical industry |
CN112115348A (en) * | 2020-08-05 | 2020-12-22 | 互联网域名系统北京市工程研究中心有限公司 | Method and system for recommending brand domain name registration |
CN113342984A (en) * | 2021-07-05 | 2021-09-03 | 深圳云谷星辰信息技术有限公司 | Garden enterprise classification method and system, intelligent terminal and storage medium |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115018258A (en) * | 2022-05-11 | 2022-09-06 | 中国城市规划设计研究院深圳分院 | Method for identifying enterprise type and industrial chain space in target area |
CN115018258B (en) * | 2022-05-11 | 2023-08-18 | 中国城市规划设计研究院深圳分院 | Method for identifying enterprise type and industry chain space in target area |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113033198B (en) | Similar text pushing method and device, electronic equipment and computer storage medium | |
CN110188345B (en) | Intelligent identification method and device for electric operation ticket | |
CN112445775B (en) | Fault analysis method, device, equipment and storage medium of photoetching machine | |
CN115146865A (en) | Task optimization method based on artificial intelligence and related equipment | |
CN112380348B (en) | Metadata processing method, apparatus, electronic device and computer readable storage medium | |
CN113408301A (en) | Sample processing method, device, equipment and medium | |
CN114444465A (en) | Information extraction method, device, equipment and storage medium | |
CN114020903A (en) | Industry classification method, device, equipment and medium for industrial enterprises | |
CN111931499A (en) | Model training method and system, and junk mail identification method, system and equipment | |
CN117332761B (en) | PDF document intelligent identification marking system | |
CN112699237B (en) | Label determination method, device and storage medium | |
CN117892820A (en) | Multistage data modeling method and system based on large language model | |
CN113722600A (en) | Data query method, device, equipment and product applied to big data | |
CN111898612A (en) | OCR recognition method and device combining RPA and AI, equipment and medium | |
CN111104422A (en) | Training method, device, equipment and storage medium of data recommendation model | |
CN113761739B (en) | Standardized expense decomposition structure construction method based on equipment characteristics | |
CN114817572A (en) | Knowledge classification method, system, device and medium based on knowledge graph | |
CN111144113B (en) | Method and system for matching capability model with work order based on machine learning | |
CN112395856B (en) | Text matching method, text matching device, computer system and readable storage medium | |
CN114860898A (en) | Software development knowledge base construction and application method | |
CN111506780A (en) | Scientific and technological project evaluation method and system | |
CN117745274B (en) | Maintenance event element integration method and system based on semantic annotation role annotation | |
Liu et al. | Automotive prospective technology mining method based on big data content analysis | |
CN117273139B (en) | Knowledge graph dynamic risk identification method and device based on open data | |
Matsubara et al. | Data Management System that Facilitates the Value Creation Cycle |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |