CN110929124A - Enterprise information recommendation method and system based on natural language - Google Patents
Enterprise information recommendation method and system based on natural language Download PDFInfo
- Publication number
- CN110929124A CN110929124A CN201911081813.7A CN201911081813A CN110929124A CN 110929124 A CN110929124 A CN 110929124A CN 201911081813 A CN201911081813 A CN 201911081813A CN 110929124 A CN110929124 A CN 110929124A
- Authority
- CN
- China
- Prior art keywords
- information
- enterprise
- industry
- label
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 20
- 238000012549 training Methods 0.000 claims abstract description 44
- 238000003062 neural network model Methods 0.000 claims abstract description 36
- 238000002372 labelling Methods 0.000 claims abstract description 21
- 238000012360 testing method Methods 0.000 claims abstract description 15
- 238000013528 artificial neural network Methods 0.000 claims abstract description 13
- 238000002360 preparation method Methods 0.000 claims abstract description 13
- 238000005516 engineering process Methods 0.000 claims abstract description 9
- 230000009193 crawling Effects 0.000 claims description 18
- 238000010586 diagram Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000007429 general method Methods 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides an enterprise information recommendation method and system based on natural language, comprising the following steps: information acquisition step: acquiring enterprise information of a website through a crawler technology; and (3) associating classification labels: associating the enterprise information with an industry classification label; training data preparation: preparing training data according to the associated industry classification labels; a neural network model generation step: according to the prepared training data, training and testing a deep convolution neural network to derive a neural network model of the industry label; intelligent labeling: and applying all the neural network models to the enterprise information of each to-be-labeled industry label, labeling a plurality of industry labels on each enterprise information, and storing in a database. The invention has high accuracy of deleting information; manual intervention is not needed; the cost is low; the information of related companies or related industries can be accurately obtained.
Description
Technical Field
The invention relates to the technical field of computer data processing, in particular to an enterprise information recommendation method and system based on natural language.
Background
The current situation of similar products: at present, the news information of enterprises is basically obtained by capturing internet information through an information crawler technology, and then the internet information is selected and classified through enterprise keywords.
Similar product deficiencies and drawbacks: the accuracy rate of the traditional keyword deleting information is low, manual intervention is often needed, the cost is high, and the information of related companies or related industries is difficult to accurately obtain.
The invention provides a general method which can carry out label identification and classification on information based on Internet enterprises according to GICS (geographic information System) industry, so that the information related to the enterprises can be automatically recommended to users concerned about the enterprises, and the information is not only related to the enterprises, but also can comprise information contents related to the GICS industry of the enterprises.
Patent document CN109657040A (application number: 201811365334.3) discloses a label recommendation method for fusing multi-source heterogeneous information, which mainly combines resource body information and network structure information of resources to recommend labels for the label recommendation method. The method has the technical effects that the comprehensiveness and the accuracy of mining the semantic information of the resources are improved by constructing the topic model simultaneously using the text content information of the resources and the network structure information among the resources, and the text content information and the network structure information of the resources are expanded through the word pair thought. Finally, a label filtering algorithm is designed, a score is calculated for each candidate label, and the most relevant label can be accurately recommended to the resource.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide an enterprise information recommendation method and system based on natural language.
The enterprise information recommendation method based on the natural language provided by the invention comprises the following steps:
information acquisition step: acquiring enterprise information of a website through a crawler technology;
and (3) associating classification labels: associating the enterprise information with an industry classification label;
training data preparation: preparing training data according to the associated industry classification labels;
a neural network model generation step: according to the prepared training data, training and testing a deep convolution neural network to derive a neural network model of the industry label;
intelligent labeling: and applying all the neural network models to the enterprise information of each to-be-labeled industry label, labeling a plurality of industry labels on each enterprise information, and storing in a database.
Preferably, the obtaining of the business information of the website includes:
crawling full information for the first time, and then crawling incremental information each time;
the total information refers to all relevant information of the target enterprise on the website;
the incremental information refers to information newly added on the related websites by the target enterprise after a preset period of time.
Preferably, the classification tag associating step:
calling a related API according to the enterprise information to obtain a main business of the enterprise and associating the main business with an industry classification label of the enterprise;
the enterprise information includes: an enterprise name and a uniform identification code.
Preferably, the training data preparing step:
manually labeling a preset amount of information for each associated industry classification label, wherein the information comprises: information belonging to the industry label and information not belonging to the industry label.
Preferably, the neural network model generating step:
and aiming at each industry classification label, training and testing the information of the marked industry label by adopting a deep convolution neural network to derive a neural network model of the industry label.
The invention provides an enterprise information recommendation system based on natural language, which comprises:
the information acquisition module: acquiring enterprise information of a website through a crawler technology;
a classification label association module: associating the enterprise information with an industry classification label;
a training data preparation module: preparing training data according to the associated industry classification labels;
a neural network model generation module: according to the prepared training data, training and testing a deep convolution neural network to derive a neural network model of the industry label;
the intelligent labeling module: and applying all the neural network models to the enterprise information of each to-be-labeled industry label, labeling a plurality of industry labels on each enterprise information, and storing in a database.
Preferably, the obtaining of the business information of the website includes:
crawling full information for the first time, and then crawling incremental information each time;
the total information refers to all relevant information of the target enterprise on the website;
the incremental information refers to information newly added on the related websites by the target enterprise after a preset period of time.
Preferably, the classification tag association module:
calling a related API according to the enterprise information to obtain a main business of the enterprise and associating the main business with an industry classification label of the enterprise;
the enterprise information includes: an enterprise name and a uniform identification code.
Preferably, the training data preparation module:
manually labeling a preset amount of information for each associated industry classification label, wherein the information comprises: information belonging to the industry label and information not belonging to the industry label.
Preferably, the neural network model generation module:
and aiming at each industry classification label, training and testing the information of the marked industry label by adopting a deep convolution neural network to derive a neural network model of the industry label.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention has high accuracy of deleting information;
2. the invention does not need manual intervention and has low cost;
3. the invention can accurately obtain the information of related companies or related industries.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
fig. 1 is a flowchart of a software recommendation method according to an embodiment of the present invention.
Fig. 2 is a schematic structural diagram of a software recommendation system in an embodiment of the present invention.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
The enterprise information recommendation method based on the natural language provided by the invention comprises the following steps:
information acquisition step: acquiring enterprise information of a website through a crawler technology;
and (3) associating classification labels: associating the enterprise information with an industry classification label;
training data preparation: preparing training data according to the associated industry classification labels;
a neural network model generation step: according to the prepared training data, training and testing a deep convolution neural network to derive a neural network model of the industry label;
intelligent labeling: and applying all the neural network models to the enterprise information of each to-be-labeled industry label, labeling a plurality of industry labels on each enterprise information, and storing in a database.
Specifically, the acquiring the enterprise information of the website includes:
crawling full information for the first time, and then crawling incremental information each time;
the total information refers to all relevant information of the target enterprise on the website;
the incremental information refers to information newly added on the related websites by the target enterprise after a preset period of time.
Specifically, the classification tag association step:
calling a related API according to the enterprise information to obtain a main business of the enterprise and associating the main business with an industry classification label of the enterprise;
the enterprise information includes: an enterprise name and a uniform identification code.
Specifically, the training data preparation step:
manually labeling a preset amount of information for each associated industry classification label, wherein the information comprises: information belonging to the industry label and information not belonging to the industry label.
Specifically, the neural network model generating step:
and aiming at each industry classification label, training and testing the information of the marked industry label by adopting a deep convolution neural network to derive a neural network model of the industry label.
The invention provides an enterprise information recommendation system based on natural language, which comprises:
the information acquisition module: acquiring enterprise information of a website through a crawler technology;
a classification label association module: associating the enterprise information with an industry classification label;
a training data preparation module: preparing training data according to the associated industry classification labels;
a neural network model generation module: according to the prepared training data, training and testing a deep convolution neural network to derive a neural network model of the industry label;
the intelligent labeling module: and applying all the neural network models to the enterprise information of each to-be-labeled industry label, labeling a plurality of industry labels on each enterprise information, and storing in a database.
Specifically, the acquiring the enterprise information of the website includes:
crawling full information for the first time, and then crawling incremental information each time;
the total information refers to all relevant information of the target enterprise on the website;
the incremental information refers to information newly added on the related websites by the target enterprise after a preset period of time.
Specifically, the classification tag association module:
calling a related API according to the enterprise information to obtain a main business of the enterprise and associating the main business with an industry classification label of the enterprise;
the enterprise information includes: an enterprise name and a uniform identification code.
Specifically, the training data preparation module:
manually labeling a preset amount of information for each associated industry classification label, wherein the information comprises: information belonging to the industry label and information not belonging to the industry label.
Specifically, the neural network model generation module:
and aiming at each industry classification label, training and testing the information of the marked industry label by adopting a deep convolution neural network to derive a neural network model of the industry label.
The present invention will be described more specifically below with reference to preferred examples.
Preferred example 1:
an association recommendation engine for enterprise information is shown in fig. 1, which is a schematic flow chart of a software recommendation method in an embodiment of the present invention, and includes the following steps:
step 1: enterprise information of related websites is obtained through a crawler technology, and crawling is performed through keyword search of enterprise names. Crawling the full information when crawling a certain website for the first time, and then crawling the incremental information each time; the total information refers to all relevant information of the target enterprise on the website; the incremental information refers to information which is newly added on the relevant websites by the target enterprise after a period of time. The purpose of only crawling the incremental information is to increase the efficiency of the crawler and reduce the system load of repeated crawling;
step 2: according to the enterprise name or the uniform identification code, related API can be called to obtain the main business of the enterprise, according to the associated information table of the main business and the industry classification, the industry classification label (GICS standard) of the enterprise is associated, namely the enterprise is mapped with the industry classification code of the GICS, and preparation is made for the data set division in the step 3;
and step 3: for training the model, a data set is required to be prepared, and according to the step 1 and the step 2, about 2000 pieces of information can be manually selected aiming at each industry classification label (namely each GICS code), wherein 1000 pieces of information belong to the industry label, and 1000 pieces of information do not belong to the industry label;
and 4, step 4: aiming at each industry classification label, training and testing the information of the marked industry label in the step 3 by adopting a deep convolutional neural network to derive a neural network model of the industry label;
and 5: applying all the neural network models in the step 4 to enterprise information of each industry label to be marked (not limited to the information crawled in the step 1, and any similar information can be applied);
step 6: each piece of enterprise information may be labeled with a plurality of industry tags, that is, each piece of information may be associated with a plurality of enterprises and stored in a database in a persistent manner.
Preferred example 2:
as shown in fig. 2, a schematic structural diagram of a software recommendation system in an embodiment of the present invention includes:
(1) and the website crawler module is used for acquiring the information related to the mass enterprise information.
(2) And the enterprise information module is used for acquiring the main business of the enterprise and associating the related industry label (GICS standard) according to the enterprise name or the unified social identification code.
(3) And the deep neural network model generation module is used for training and testing the artificially labeled enterprise information to obtain a label classification mathematical model meeting the threshold standard.
(4) And the intelligent labeling module labels the label classification of the enterprise information by using the label classification model obtained by training and persistently stores the result in the database.
Those skilled in the art will appreciate that, in addition to implementing the systems, apparatus, and various modules thereof provided by the present invention in purely computer readable program code, the same procedures can be implemented entirely by logically programming method steps such that the systems, apparatus, and various modules thereof are provided in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system, the device and the modules thereof provided by the present invention can be considered as a hardware component, and the modules included in the system, the device and the modules thereof for implementing various programs can also be considered as structures in the hardware component; modules for performing various functions may also be considered to be both software programs for performing the methods and structures within hardware components.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.
Claims (10)
1. A method for recommending enterprise information based on natural language is characterized by comprising the following steps:
information acquisition step: acquiring enterprise information of a website through a crawler technology;
and (3) associating classification labels: associating the enterprise information with an industry classification label;
training data preparation: preparing training data according to the associated industry classification labels;
a neural network model generation step: according to the prepared training data, training and testing a deep convolution neural network to derive a neural network model of the industry label;
intelligent labeling: and applying all the neural network models to the enterprise information of each to-be-labeled industry label, labeling a plurality of industry labels on each enterprise information, and storing in a database.
2. The natural language based enterprise information recommendation method of claim 1, wherein the obtaining the enterprise information of the website comprises:
crawling full information for the first time, and then crawling incremental information each time;
the total information refers to all relevant information of the target enterprise on the website;
the incremental information refers to information newly added on the related websites by the target enterprise after a preset period of time.
3. The natural language based enterprise information recommendation method of claim 1, wherein said classification label association step:
calling a related API according to the enterprise information to obtain a main business of the enterprise and associating the main business with an industry classification label of the enterprise;
the enterprise information includes: an enterprise name and a uniform identification code.
4. The natural language based business information recommendation method according to claim 1, wherein the training data preparation step:
manually labeling a preset amount of information for each associated industry classification label, wherein the information comprises: information belonging to the industry label and information not belonging to the industry label.
5. The natural language based enterprise information recommendation method of claim 1, wherein the neural network model generation step:
and aiming at each industry classification label, training and testing the information of the marked industry label by adopting a deep convolution neural network to derive a neural network model of the industry label.
6. An enterprise information recommendation system based on natural language is characterized by comprising:
the information acquisition module: acquiring enterprise information of a website through a crawler technology;
a classification label association module: associating the enterprise information with an industry classification label;
a training data preparation module: preparing training data according to the associated industry classification labels;
a neural network model generation module: according to the prepared training data, training and testing a deep convolution neural network to derive a neural network model of the industry label;
the intelligent labeling module: and applying all the neural network models to the enterprise information of each to-be-labeled industry label, labeling a plurality of industry labels on each enterprise information, and storing in a database.
7. The natural language based enterprise information recommendation system according to claim 6, wherein said obtaining enterprise information of web sites comprises:
crawling full information for the first time, and then crawling incremental information each time;
the total information refers to all relevant information of the target enterprise on the website;
the incremental information refers to information newly added on the related websites by the target enterprise after a preset period of time.
8. The natural language based enterprise information recommendation system of claim 6, wherein said category label association module:
calling a related API according to the enterprise information to obtain a main business of the enterprise and associating the main business with an industry classification label of the enterprise;
the enterprise information includes: an enterprise name and a uniform identification code.
9. The natural language based business information recommendation system of claim 6 wherein said training data preparation module:
manually labeling a preset amount of information for each associated industry classification label, wherein the information comprises: information belonging to the industry label and information not belonging to the industry label.
10. The natural language based enterprise information recommendation system of claim 6, wherein said neural network model generation module:
and aiming at each industry classification label, training and testing the information of the marked industry label by adopting a deep convolution neural network to derive a neural network model of the industry label.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911081813.7A CN110929124A (en) | 2019-11-07 | 2019-11-07 | Enterprise information recommendation method and system based on natural language |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911081813.7A CN110929124A (en) | 2019-11-07 | 2019-11-07 | Enterprise information recommendation method and system based on natural language |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110929124A true CN110929124A (en) | 2020-03-27 |
Family
ID=69852538
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911081813.7A Pending CN110929124A (en) | 2019-11-07 | 2019-11-07 | Enterprise information recommendation method and system based on natural language |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110929124A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190122096A1 (en) * | 2017-10-25 | 2019-04-25 | SparkCognition, Inc. | Automated evaluation of neural networks using trained classifier |
CN109783818A (en) * | 2019-01-17 | 2019-05-21 | 上海三零卫士信息安全有限公司 | A kind of enterprises ' industry multi-tag classification method |
CN110245226A (en) * | 2018-10-23 | 2019-09-17 | 爱信诺征信有限公司 | Enterprises ' industry classification method and its device |
-
2019
- 2019-11-07 CN CN201911081813.7A patent/CN110929124A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190122096A1 (en) * | 2017-10-25 | 2019-04-25 | SparkCognition, Inc. | Automated evaluation of neural networks using trained classifier |
CN110245226A (en) * | 2018-10-23 | 2019-09-17 | 爱信诺征信有限公司 | Enterprises ' industry classification method and its device |
CN109783818A (en) * | 2019-01-17 | 2019-05-21 | 上海三零卫士信息安全有限公司 | A kind of enterprises ' industry multi-tag classification method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20190129942A1 (en) | Methods and systems for automatically generating reports from search results | |
US8527451B2 (en) | Business semantic network build | |
US10860658B2 (en) | Providing a search service including updating aspects of a document using a configurable schema | |
CN112749284B (en) | Knowledge graph construction method, device, equipment and storage medium | |
CN105630768B (en) | A kind of product name recognition method and device based on stacking condition random field | |
CN102681994B (en) | Webpage information extracting method and system | |
CN102566945B (en) | Method and system for realizing automatic acquisition and on-demand printing of book | |
CN103514299A (en) | Information searching method and device | |
CN105095320A (en) | System for identifying, correlating, searching and displaying documents based on relationship superposition and combination | |
CN105095319A (en) | Time serialization based document identifying, associating, searching and showing system | |
CN104750754A (en) | Website industry classification method and server | |
CN108959580A (en) | A kind of optimization method and system of label data | |
WO2014000130A1 (en) | Method or system for automated extraction of hyper-local events from one or more web pages | |
CN105117434A (en) | Webpage classification method and webpage classification system | |
CN106503266A (en) | Document Classification Method and device | |
CN103914487A (en) | Document collection, identification and association system | |
KR20170115109A (en) | Text-Mining Application Technique for Productive Construction Document Management | |
US20170235835A1 (en) | Information identification and extraction | |
CN103914486A (en) | Document search and display system | |
CN114462556A (en) | Enterprise association industry chain classification method, training method, device, equipment and medium | |
CN105183843A (en) | List page recognition system and method | |
CN110929124A (en) | Enterprise information recommendation method and system based on natural language | |
CN110633319A (en) | Big data analysis system for industrial design | |
CN110110050B (en) | Method for generating news event generating type question-answer data set | |
CN103324640B (en) | A kind of method, device and equipment determining search result document |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200327 |