CN110929124A - Enterprise information recommendation method and system based on natural language - Google Patents

Enterprise information recommendation method and system based on natural language Download PDF

Info

Publication number
CN110929124A
CN110929124A CN201911081813.7A CN201911081813A CN110929124A CN 110929124 A CN110929124 A CN 110929124A CN 201911081813 A CN201911081813 A CN 201911081813A CN 110929124 A CN110929124 A CN 110929124A
Authority
CN
China
Prior art keywords
information
enterprise
industry
label
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911081813.7A
Other languages
Chinese (zh)
Inventor
潘翔
王菲
骆玮璐
杨牧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Rongdaitong Financial Information Service Co Ltd
Original Assignee
Shanghai Rongdaitong Financial Information Service Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Rongdaitong Financial Information Service Co Ltd filed Critical Shanghai Rongdaitong Financial Information Service Co Ltd
Priority to CN201911081813.7A priority Critical patent/CN110929124A/en
Publication of CN110929124A publication Critical patent/CN110929124A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides an enterprise information recommendation method and system based on natural language, comprising the following steps: information acquisition step: acquiring enterprise information of a website through a crawler technology; and (3) associating classification labels: associating the enterprise information with an industry classification label; training data preparation: preparing training data according to the associated industry classification labels; a neural network model generation step: according to the prepared training data, training and testing a deep convolution neural network to derive a neural network model of the industry label; intelligent labeling: and applying all the neural network models to the enterprise information of each to-be-labeled industry label, labeling a plurality of industry labels on each enterprise information, and storing in a database. The invention has high accuracy of deleting information; manual intervention is not needed; the cost is low; the information of related companies or related industries can be accurately obtained.

Description

Enterprise information recommendation method and system based on natural language
Technical Field
The invention relates to the technical field of computer data processing, in particular to an enterprise information recommendation method and system based on natural language.
Background
The current situation of similar products: at present, the news information of enterprises is basically obtained by capturing internet information through an information crawler technology, and then the internet information is selected and classified through enterprise keywords.
Similar product deficiencies and drawbacks: the accuracy rate of the traditional keyword deleting information is low, manual intervention is often needed, the cost is high, and the information of related companies or related industries is difficult to accurately obtain.
The invention provides a general method which can carry out label identification and classification on information based on Internet enterprises according to GICS (geographic information System) industry, so that the information related to the enterprises can be automatically recommended to users concerned about the enterprises, and the information is not only related to the enterprises, but also can comprise information contents related to the GICS industry of the enterprises.
Patent document CN109657040A (application number: 201811365334.3) discloses a label recommendation method for fusing multi-source heterogeneous information, which mainly combines resource body information and network structure information of resources to recommend labels for the label recommendation method. The method has the technical effects that the comprehensiveness and the accuracy of mining the semantic information of the resources are improved by constructing the topic model simultaneously using the text content information of the resources and the network structure information among the resources, and the text content information and the network structure information of the resources are expanded through the word pair thought. Finally, a label filtering algorithm is designed, a score is calculated for each candidate label, and the most relevant label can be accurately recommended to the resource.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide an enterprise information recommendation method and system based on natural language.
The enterprise information recommendation method based on the natural language provided by the invention comprises the following steps:
information acquisition step: acquiring enterprise information of a website through a crawler technology;
and (3) associating classification labels: associating the enterprise information with an industry classification label;
training data preparation: preparing training data according to the associated industry classification labels;
a neural network model generation step: according to the prepared training data, training and testing a deep convolution neural network to derive a neural network model of the industry label;
intelligent labeling: and applying all the neural network models to the enterprise information of each to-be-labeled industry label, labeling a plurality of industry labels on each enterprise information, and storing in a database.
Preferably, the obtaining of the business information of the website includes:
crawling full information for the first time, and then crawling incremental information each time;
the total information refers to all relevant information of the target enterprise on the website;
the incremental information refers to information newly added on the related websites by the target enterprise after a preset period of time.
Preferably, the classification tag associating step:
calling a related API according to the enterprise information to obtain a main business of the enterprise and associating the main business with an industry classification label of the enterprise;
the enterprise information includes: an enterprise name and a uniform identification code.
Preferably, the training data preparing step:
manually labeling a preset amount of information for each associated industry classification label, wherein the information comprises: information belonging to the industry label and information not belonging to the industry label.
Preferably, the neural network model generating step:
and aiming at each industry classification label, training and testing the information of the marked industry label by adopting a deep convolution neural network to derive a neural network model of the industry label.
The invention provides an enterprise information recommendation system based on natural language, which comprises:
the information acquisition module: acquiring enterprise information of a website through a crawler technology;
a classification label association module: associating the enterprise information with an industry classification label;
a training data preparation module: preparing training data according to the associated industry classification labels;
a neural network model generation module: according to the prepared training data, training and testing a deep convolution neural network to derive a neural network model of the industry label;
the intelligent labeling module: and applying all the neural network models to the enterprise information of each to-be-labeled industry label, labeling a plurality of industry labels on each enterprise information, and storing in a database.
Preferably, the obtaining of the business information of the website includes:
crawling full information for the first time, and then crawling incremental information each time;
the total information refers to all relevant information of the target enterprise on the website;
the incremental information refers to information newly added on the related websites by the target enterprise after a preset period of time.
Preferably, the classification tag association module:
calling a related API according to the enterprise information to obtain a main business of the enterprise and associating the main business with an industry classification label of the enterprise;
the enterprise information includes: an enterprise name and a uniform identification code.
Preferably, the training data preparation module:
manually labeling a preset amount of information for each associated industry classification label, wherein the information comprises: information belonging to the industry label and information not belonging to the industry label.
Preferably, the neural network model generation module:
and aiming at each industry classification label, training and testing the information of the marked industry label by adopting a deep convolution neural network to derive a neural network model of the industry label.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention has high accuracy of deleting information;
2. the invention does not need manual intervention and has low cost;
3. the invention can accurately obtain the information of related companies or related industries.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
fig. 1 is a flowchart of a software recommendation method according to an embodiment of the present invention.
Fig. 2 is a schematic structural diagram of a software recommendation system in an embodiment of the present invention.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
The enterprise information recommendation method based on the natural language provided by the invention comprises the following steps:
information acquisition step: acquiring enterprise information of a website through a crawler technology;
and (3) associating classification labels: associating the enterprise information with an industry classification label;
training data preparation: preparing training data according to the associated industry classification labels;
a neural network model generation step: according to the prepared training data, training and testing a deep convolution neural network to derive a neural network model of the industry label;
intelligent labeling: and applying all the neural network models to the enterprise information of each to-be-labeled industry label, labeling a plurality of industry labels on each enterprise information, and storing in a database.
Specifically, the acquiring the enterprise information of the website includes:
crawling full information for the first time, and then crawling incremental information each time;
the total information refers to all relevant information of the target enterprise on the website;
the incremental information refers to information newly added on the related websites by the target enterprise after a preset period of time.
Specifically, the classification tag association step:
calling a related API according to the enterprise information to obtain a main business of the enterprise and associating the main business with an industry classification label of the enterprise;
the enterprise information includes: an enterprise name and a uniform identification code.
Specifically, the training data preparation step:
manually labeling a preset amount of information for each associated industry classification label, wherein the information comprises: information belonging to the industry label and information not belonging to the industry label.
Specifically, the neural network model generating step:
and aiming at each industry classification label, training and testing the information of the marked industry label by adopting a deep convolution neural network to derive a neural network model of the industry label.
The invention provides an enterprise information recommendation system based on natural language, which comprises:
the information acquisition module: acquiring enterprise information of a website through a crawler technology;
a classification label association module: associating the enterprise information with an industry classification label;
a training data preparation module: preparing training data according to the associated industry classification labels;
a neural network model generation module: according to the prepared training data, training and testing a deep convolution neural network to derive a neural network model of the industry label;
the intelligent labeling module: and applying all the neural network models to the enterprise information of each to-be-labeled industry label, labeling a plurality of industry labels on each enterprise information, and storing in a database.
Specifically, the acquiring the enterprise information of the website includes:
crawling full information for the first time, and then crawling incremental information each time;
the total information refers to all relevant information of the target enterprise on the website;
the incremental information refers to information newly added on the related websites by the target enterprise after a preset period of time.
Specifically, the classification tag association module:
calling a related API according to the enterprise information to obtain a main business of the enterprise and associating the main business with an industry classification label of the enterprise;
the enterprise information includes: an enterprise name and a uniform identification code.
Specifically, the training data preparation module:
manually labeling a preset amount of information for each associated industry classification label, wherein the information comprises: information belonging to the industry label and information not belonging to the industry label.
Specifically, the neural network model generation module:
and aiming at each industry classification label, training and testing the information of the marked industry label by adopting a deep convolution neural network to derive a neural network model of the industry label.
The present invention will be described more specifically below with reference to preferred examples.
Preferred example 1:
an association recommendation engine for enterprise information is shown in fig. 1, which is a schematic flow chart of a software recommendation method in an embodiment of the present invention, and includes the following steps:
step 1: enterprise information of related websites is obtained through a crawler technology, and crawling is performed through keyword search of enterprise names. Crawling the full information when crawling a certain website for the first time, and then crawling the incremental information each time; the total information refers to all relevant information of the target enterprise on the website; the incremental information refers to information which is newly added on the relevant websites by the target enterprise after a period of time. The purpose of only crawling the incremental information is to increase the efficiency of the crawler and reduce the system load of repeated crawling;
step 2: according to the enterprise name or the uniform identification code, related API can be called to obtain the main business of the enterprise, according to the associated information table of the main business and the industry classification, the industry classification label (GICS standard) of the enterprise is associated, namely the enterprise is mapped with the industry classification code of the GICS, and preparation is made for the data set division in the step 3;
and step 3: for training the model, a data set is required to be prepared, and according to the step 1 and the step 2, about 2000 pieces of information can be manually selected aiming at each industry classification label (namely each GICS code), wherein 1000 pieces of information belong to the industry label, and 1000 pieces of information do not belong to the industry label;
and 4, step 4: aiming at each industry classification label, training and testing the information of the marked industry label in the step 3 by adopting a deep convolutional neural network to derive a neural network model of the industry label;
and 5: applying all the neural network models in the step 4 to enterprise information of each industry label to be marked (not limited to the information crawled in the step 1, and any similar information can be applied);
step 6: each piece of enterprise information may be labeled with a plurality of industry tags, that is, each piece of information may be associated with a plurality of enterprises and stored in a database in a persistent manner.
Preferred example 2:
as shown in fig. 2, a schematic structural diagram of a software recommendation system in an embodiment of the present invention includes:
(1) and the website crawler module is used for acquiring the information related to the mass enterprise information.
(2) And the enterprise information module is used for acquiring the main business of the enterprise and associating the related industry label (GICS standard) according to the enterprise name or the unified social identification code.
(3) And the deep neural network model generation module is used for training and testing the artificially labeled enterprise information to obtain a label classification mathematical model meeting the threshold standard.
(4) And the intelligent labeling module labels the label classification of the enterprise information by using the label classification model obtained by training and persistently stores the result in the database.
Those skilled in the art will appreciate that, in addition to implementing the systems, apparatus, and various modules thereof provided by the present invention in purely computer readable program code, the same procedures can be implemented entirely by logically programming method steps such that the systems, apparatus, and various modules thereof are provided in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system, the device and the modules thereof provided by the present invention can be considered as a hardware component, and the modules included in the system, the device and the modules thereof for implementing various programs can also be considered as structures in the hardware component; modules for performing various functions may also be considered to be both software programs for performing the methods and structures within hardware components.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims (10)

1. A method for recommending enterprise information based on natural language is characterized by comprising the following steps:
information acquisition step: acquiring enterprise information of a website through a crawler technology;
and (3) associating classification labels: associating the enterprise information with an industry classification label;
training data preparation: preparing training data according to the associated industry classification labels;
a neural network model generation step: according to the prepared training data, training and testing a deep convolution neural network to derive a neural network model of the industry label;
intelligent labeling: and applying all the neural network models to the enterprise information of each to-be-labeled industry label, labeling a plurality of industry labels on each enterprise information, and storing in a database.
2. The natural language based enterprise information recommendation method of claim 1, wherein the obtaining the enterprise information of the website comprises:
crawling full information for the first time, and then crawling incremental information each time;
the total information refers to all relevant information of the target enterprise on the website;
the incremental information refers to information newly added on the related websites by the target enterprise after a preset period of time.
3. The natural language based enterprise information recommendation method of claim 1, wherein said classification label association step:
calling a related API according to the enterprise information to obtain a main business of the enterprise and associating the main business with an industry classification label of the enterprise;
the enterprise information includes: an enterprise name and a uniform identification code.
4. The natural language based business information recommendation method according to claim 1, wherein the training data preparation step:
manually labeling a preset amount of information for each associated industry classification label, wherein the information comprises: information belonging to the industry label and information not belonging to the industry label.
5. The natural language based enterprise information recommendation method of claim 1, wherein the neural network model generation step:
and aiming at each industry classification label, training and testing the information of the marked industry label by adopting a deep convolution neural network to derive a neural network model of the industry label.
6. An enterprise information recommendation system based on natural language is characterized by comprising:
the information acquisition module: acquiring enterprise information of a website through a crawler technology;
a classification label association module: associating the enterprise information with an industry classification label;
a training data preparation module: preparing training data according to the associated industry classification labels;
a neural network model generation module: according to the prepared training data, training and testing a deep convolution neural network to derive a neural network model of the industry label;
the intelligent labeling module: and applying all the neural network models to the enterprise information of each to-be-labeled industry label, labeling a plurality of industry labels on each enterprise information, and storing in a database.
7. The natural language based enterprise information recommendation system according to claim 6, wherein said obtaining enterprise information of web sites comprises:
crawling full information for the first time, and then crawling incremental information each time;
the total information refers to all relevant information of the target enterprise on the website;
the incremental information refers to information newly added on the related websites by the target enterprise after a preset period of time.
8. The natural language based enterprise information recommendation system of claim 6, wherein said category label association module:
calling a related API according to the enterprise information to obtain a main business of the enterprise and associating the main business with an industry classification label of the enterprise;
the enterprise information includes: an enterprise name and a uniform identification code.
9. The natural language based business information recommendation system of claim 6 wherein said training data preparation module:
manually labeling a preset amount of information for each associated industry classification label, wherein the information comprises: information belonging to the industry label and information not belonging to the industry label.
10. The natural language based enterprise information recommendation system of claim 6, wherein said neural network model generation module:
and aiming at each industry classification label, training and testing the information of the marked industry label by adopting a deep convolution neural network to derive a neural network model of the industry label.
CN201911081813.7A 2019-11-07 2019-11-07 Enterprise information recommendation method and system based on natural language Pending CN110929124A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911081813.7A CN110929124A (en) 2019-11-07 2019-11-07 Enterprise information recommendation method and system based on natural language

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911081813.7A CN110929124A (en) 2019-11-07 2019-11-07 Enterprise information recommendation method and system based on natural language

Publications (1)

Publication Number Publication Date
CN110929124A true CN110929124A (en) 2020-03-27

Family

ID=69852538

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911081813.7A Pending CN110929124A (en) 2019-11-07 2019-11-07 Enterprise information recommendation method and system based on natural language

Country Status (1)

Country Link
CN (1) CN110929124A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190122096A1 (en) * 2017-10-25 2019-04-25 SparkCognition, Inc. Automated evaluation of neural networks using trained classifier
CN109783818A (en) * 2019-01-17 2019-05-21 上海三零卫士信息安全有限公司 A kind of enterprises ' industry multi-tag classification method
CN110245226A (en) * 2018-10-23 2019-09-17 爱信诺征信有限公司 Enterprises ' industry classification method and its device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190122096A1 (en) * 2017-10-25 2019-04-25 SparkCognition, Inc. Automated evaluation of neural networks using trained classifier
CN110245226A (en) * 2018-10-23 2019-09-17 爱信诺征信有限公司 Enterprises ' industry classification method and its device
CN109783818A (en) * 2019-01-17 2019-05-21 上海三零卫士信息安全有限公司 A kind of enterprises ' industry multi-tag classification method

Similar Documents

Publication Publication Date Title
US20190129942A1 (en) Methods and systems for automatically generating reports from search results
US8527451B2 (en) Business semantic network build
US10860658B2 (en) Providing a search service including updating aspects of a document using a configurable schema
CN112749284B (en) Knowledge graph construction method, device, equipment and storage medium
CN105630768B (en) A kind of product name recognition method and device based on stacking condition random field
CN102681994B (en) Webpage information extracting method and system
CN102566945B (en) Method and system for realizing automatic acquisition and on-demand printing of book
CN103514299A (en) Information searching method and device
CN105095320A (en) System for identifying, correlating, searching and displaying documents based on relationship superposition and combination
CN105095319A (en) Time serialization based document identifying, associating, searching and showing system
CN104750754A (en) Website industry classification method and server
CN108959580A (en) A kind of optimization method and system of label data
WO2014000130A1 (en) Method or system for automated extraction of hyper-local events from one or more web pages
CN105117434A (en) Webpage classification method and webpage classification system
CN106503266A (en) Document Classification Method and device
CN103914487A (en) Document collection, identification and association system
KR20170115109A (en) Text-Mining Application Technique for Productive Construction Document Management
US20170235835A1 (en) Information identification and extraction
CN103914486A (en) Document search and display system
CN114462556A (en) Enterprise association industry chain classification method, training method, device, equipment and medium
CN105183843A (en) List page recognition system and method
CN110929124A (en) Enterprise information recommendation method and system based on natural language
CN110633319A (en) Big data analysis system for industrial design
CN110110050B (en) Method for generating news event generating type question-answer data set
CN103324640B (en) A kind of method, device and equipment determining search result document

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200327