CN110245226A - Enterprises ' industry classification method and its device - Google Patents
Enterprises ' industry classification method and its device Download PDFInfo
- Publication number
- CN110245226A CN110245226A CN201811237531.7A CN201811237531A CN110245226A CN 110245226 A CN110245226 A CN 110245226A CN 201811237531 A CN201811237531 A CN 201811237531A CN 110245226 A CN110245226 A CN 110245226A
- Authority
- CN
- China
- Prior art keywords
- data
- industry
- enterprise
- classification
- enterprises
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the present application provides a kind of enterprises ' industry classification method and its device, and wherein method includes obtaining a variety of basic datas of enterprise, and choose data therefrom with the corresponding data model of the data established with chosen;The industry prediction classification of the enterprise is obtained according to the data model;The industry prediction classification comprehensive analysis of multiple enterprises is obtained into the industry final classification of the enterprise;It can classify automatically for enterprises ' industry, greatly reduce the artificial workload for carrying out industry mark, and improve the accuracy of trade classification by the classification method of intelligence.
Description
Technical field
This application involves the method and device thereof that Internet technical field more particularly to a kind of enterprises ' industry are classified.
Background technique
Recently as the prosperity of China market economy, important component of enterprise's reference as social credit system,
It is played an important role in terms of improving market system, Maintenance Market order, promoting.Based on reference business
Needs, it would be desirable to accurately classified to the industry of enterprise.
Enterprises ' industry classification in tax registration information is that company sets at the beginning of the tax registration, and human error compares
Greatly, and with enterprise true trade classification there are larger differences, so needing again accurately to classify to the industry of enterprise.Separately
A variety of basic datas of the enterprise including tax big data and enterprise operation business etc. have very big break-up value outside, so
It can accurately be classified according to industry of these basic datas to enterprise when the industry to enterprise is classified.
A variety of basic datas based on enterprise in the prior art, such as the management functions and the data such as tax big data of enterprise
When classifying to the industry of enterprise, master to be used is still traditional method, using artificial statistical analysis to the row of enterprise
Industry is classified, and needs to consume a large amount of manpower and material resources, and workload is heavy, and working efficiency is low;In addition, in the prior art to enterprise
Industry when being classified, it is basic only unilaterally to refer to a kind of data, for example, according to enterprise name data to the industry of enterprise into
Row classification, so that classification results accuracy is lower, confidence level is not high.Therefore, need at present it is a kind of by big data analysis and
The method that the method for artificial intelligence carries out efficiently and accurately intelligent classification to the industry of enterprise.
Summary of the invention
In view of this, the main purpose of the application is to provide a kind of enterprises ' industry classification method and its device, to solve
The technical problems existing in the prior art.
In a first aspect, the embodiment of the present application provides a kind of enterprises ' industry classification method, comprising:
A variety of basic datas of enterprise are obtained, and choose data therefrom with the corresponding data mould of the data established with chosen
Type;
The industry prediction classification of the enterprise is obtained according to the data model;
The industry prediction classification comprehensive analysis of multiple enterprises is obtained into the industry final classification of the enterprise.
Optionally, in one embodiment of the application, after choosing data, data model corresponding with the data chosen is established
It before, further include that the data of the selection are pre-processed.
Optionally, in one embodiment of the application, carrying out pretreatment to the data of the selection includes in aftermentioned processing
At least one carries out word segmentation processing to data using participle tool, deletes the repeated data in the data of the selection, smoothly make an uproar
Sound data.
Optionally, in one embodiment of the application, a variety of basic datas include the enterprise name data, manage model
Enclose at least one of data, main products data, upstream firm code data and down-stream enterprise's code data.
Optionally, in one embodiment of the application, the data of therefrom choosing are with the corresponding number of the data established with chosen
According to model, including, a kind of data or a variety of data are chosen from a variety of basic datas, and the data of the selection are carried out
Vectorization processing, according to the first algorithm of setting, establishes corresponding data model.
Optionally, in one embodiment of the application, the first algorithm of the setting includes convolutional neural networks method, MLPC
At least one of method and Logistic homing method.
Optionally, described to obtain the industry prediction classification comprehensive analysis of multiple enterprises in one embodiment of the application
Industry final classification to the enterprise includes, using the second algorithm of setting, classifying to the industry prediction of multiple enterprises
Comprehensive analysis obtains the industry final classification of the enterprise.
Optionally, described that the industry prediction of the enterprise is obtained according to the data model in one embodiment of the application
Before classification, further include extracting the basic data of setting ratio at random from database, the type of the basic data with
The type for establishing the data model is corresponding, by analyzing the data of extraction, obtains the industry Accurate classification of enterprise,
The data model learns the industry Accurate classification.
Second aspect, this application provides a kind of enterprises ' industry sorters, comprising:
Modeling unit for obtaining a variety of basic datas of enterprise, and therefrom chooses data of the data to establish and choose
Corresponding data model;
Predict taxon, the industry prediction for obtaining the enterprise according to the data model is classified;
Analytical unit, for the industry prediction classification comprehensive analysis of multiple enterprises to be obtained the industry of the enterprise most
Classification eventually.
The third aspect, this application provides a kind of enterprises ' industry categorizing systems characterized by comprising data server
And headend equipment, the data server are used to store a variety of basic datas of enterprise;The headend equipment configured with one or
The multiple processors of person, the processor are used for: being obtained a variety of basic datas of enterprise, and chosen data therefrom to establish and choose
The corresponding data model of data;The industry prediction classification of the enterprise is obtained according to the data model;By multiple enterprises
The industry prediction classification comprehensive analysis of industry obtains the industry final classification of the enterprise.
To sum up, in above-mentioned technical proposal provided by the embodiments of the present application, by a variety of basic datas of acquisition enterprise, and from
Middle selection data are with the corresponding data model of the data established with chosen;The industry of the enterprise is obtained according to the data model
Prediction classification;The industry prediction classification comprehensive analysis of multiple enterprises is obtained into the industry final classification of the enterprise;It can be with
Automatically classify for enterprises ' industry, greatly reduce the artificial workload for carrying out industry mark, and point for passing through intelligence
Class method improves the accuracy of trade classification.
Detailed description of the invention
In order to illustrate more clearly of the application or technical solution in the prior art, below by use required in embodiment
Attached drawing be briefly described, it should be apparent that, the accompanying drawings in the following description be only it is more as described in this application, for this
For the those of ordinary skill of field, it is also possible to obtain other drawings based on these drawings.
Fig. 1 is the flow diagram of enterprises ' industry classification method in the embodiment of the present application one.
Fig. 2 is the specific implementation method flow chart of step S101 in the embodiment of the present application one.
Fig. 3 is the structural schematic diagram of enterprises ' industry sorter in the embodiment of the present application two.
Fig. 4 is the structural schematic diagram of enterprises ' industry categorizing system in the embodiment of the present application three.
Specific embodiment
Any technical solution for implementing the embodiment of the present application must be not necessarily required to reach simultaneously above all advantages.
In order to make those skilled in the art more fully understand the technical solution in the embodiment of the present application, below in conjunction with the application
Attached drawing in embodiment, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described reality
Applying example only is the embodiment of the present application a part of the embodiment, instead of all the embodiments.Based on the implementation in the embodiment of the present application
The range of the embodiment of the present application protection all should belong in example, those of ordinary skill in the art's every other embodiment obtained.
Further illustrate that the embodiment of the present application implements below with reference to the embodiment of the present application attached drawing.
Fig. 1 is the flow diagram of enterprises ' industry classification method in the embodiment of the present application one.As shown in Figure 1 comprising such as
Lower step:
S101, a variety of basic datas for obtaining enterprise, and choose data therefrom with the corresponding number of the data established with chosen
According to model.
In the present embodiment, choose data when can choose it is a kind of establish a kind of corresponding data model, also can choose more
Kind data establish corresponding another data model.
In the present embodiment, a variety of basic datas include the enterprise name data, business scope data, main products
At least one of data, upstream firm code data and down-stream enterprise's code data can also include the contact of the enterprise
The data etc. of invoice.Wherein enterprise name data, business scope data and these three data of main products data are text data,
And upstream firm code data and down-stream enterprise's code data are non-text data, represent the type of business of upstream and downstream firms.
Fig. 2 is the specific implementation method flow chart of step S101 in the embodiment of the present application one.As shown in Figure 2 comprising such as
Lower step:
S111, a variety of basic datas for obtaining enterprise, and therefrom choose data.
In the present embodiment, a variety of basic datas for obtaining enterprise can directly acquire required number from by plug-in unit from network
According to, such as data can be grabbed from webpage using reptile instrument, naturally it is also possible to use other methods;It in addition can also be institute
Storage is into background data server or other storage equipment after stating basic data acquisition, then to the basis of acquisition
Data carry out classification processing, when obtaining data, can obtain institute from background data server or in other storage equipment
Need the data buffer storage of type into memory;Certainly data can also be obtained by other methods.
S112, the data of the selection are pre-processed.
In the present embodiment, pretreatment is carried out to data and is comprised at least one of the following:
Word segmentation processing is carried out to data using participle tool;By the way that research shows that characteristic particle size is word granularity, it is representative remote
It is better than word granularity, so carrying out word segmentation processing to text data, preferably participle tool takes Jieba to segment tool.Wherein such as
The data that fruit is chosen are main management commodity data, due to itself being just word, so not having to carry out word segmentation processing to it.
Delete the repeated data in the data of the selection;If the inside might have repetition in the basic data chosen
Data avoid it from having an impact the industry prediction classification of enterprise so to carry out delete processing to repeated data.
Smooth noise data;By smooth noise data, missing data and the abnormal data etc. in the data of selection are removed,
The quality for improving data improves the accuracy of the industry prediction classification of enterprise.
S113, a kind of data or a variety of data are chosen from a variety of basic datas, to the data of the selection into
Row vectorization processing establishes data model corresponding with the data chosen according to the first algorithm of setting.
In the present embodiment, carrying out vectorization processing to the data of the selection includes, by data be converted into feature vector or
Person carries out coding to data and obtains feature vector, certainly further includes that other methods carry out vectorization processing.
Specifically, if the data chosen are enterprise name data, the text datas such as business scope data have passed through pre- place
Reason will also carry out stop words and handle, and stop words is pronoun conjunction preposition of some high frequencies in text data etc. to data vectorization
Meaningless word preferably sets up a deactivated vocabulary, data and deactivated vocabulary is compared, and deletes deactivating in data
Word.
Specifically, the feature vector of text data is extracted, such as its corresponding Feature Words is extracted according to enterprise name, will be looked forward to
The text data of industry title is expressed as a vector of vector space, and different Feature Words are a dimension of vector space,
The value of each dimension is the weight of corresponding characteristic item in the text, and calculates feature weight point to Feature Words, according to point
It is several that all Feature Words are ranked up, it selects score highest as most important Feature Words, filters remaining Feature Words, obtain
Final feature vector.
It specifically, itself is exactly word, it is advantageous to compile to it if the data chosen are main management commodity data
Code obtains feature vector, preferably one-hot is taken to encode, can guarantee that the single feature in each sample only has 1 to be in
State 1, others are all 0.Code data for upstream and downstream firms is also, it is preferred that obtain feature vector using coded treatment.
Further, in one embodiment, the first algorithm includes convolutional neural networks method, MLPC method and
At least one of Logistic homing method.
Specifically, enterprise name data after two kinds of data of business scope data carry out vectorization processing, are put it into artificial
Neural network model carries out modeling work, obtains the first data submodel.It is preferable to use convolutional neural networks methods to establish to it
Data model.Since the feature sizes that various sizes of convolution kernel obtains are different, their dimension is made using pond function
It is identical, it is preferable to use maximum value pond, extracts the maximum value in each convolution kernel, obtained after cascade final text feature to
Amount.
Specifically, after main products data are encoded, directly pass through convolutional Neural net with enterprise name and business scope
The Text eigenvector that network is handled is overlapped, and obtains the second data submodel, then the neural network by connecting entirely
It is trained and predicts, so as to obtain the industry prediction classification of more accurate enterprise.
Specifically, the present embodiment further includes encoding to trade classification, for according to upstream firm code data
The third data submodel of foundation and according to down-stream enterprise's code data establish the 4th data submodel model obtain enterprise
Industry prediction classification.
Specifically, upstream firm code data establishes data model it is preferable to use Logistic homing method, obtains third
Data submodel, down-stream enterprise's code data establish data model it is preferable to use MLPC method, obtain the 4th data submodel,
It is of course also possible to which other methods is selected to establish data model.
Further, in one embodiment, the basic data of setting ratio, institute are extracted at random from database
The type for stating basic data is corresponding with the type for establishing the data model, by analyzing the data of extraction, obtains
The industry Accurate classification of enterprise, the data model learn the industry Accurate classification.
Specifically, the basic data that all enterprises are store in database, can store in data server, can also be with
It is stored in storage equipment;The ratio of setting can be set according to the data volume in database, make the industry for obtaining enterprise
Accurate classification is representative.
Specifically, data model carries out the industry Accurate classification to rely on powerful parallel of Spark in learning process
Memory processing technique realizes quickly dynamically model training.Data are much lagged behind this method solve model modification to update
The problem of, it significantly improves the validity of model training result.
S102, the industry prediction classification that the enterprise is obtained according to the data model.
In the present embodiment, input is enterprise name data, business scope data, main products number in the second data submodel
According to coding and trade classification coding, it is trained into model, after the completion of model training, by the data of unknown trade classification enterprise
It is input in model, the result of output is the trade classification of the second prediction of the enterprise.Wherein in third data submodel
The coding of upstream firm data and the coding of trade classification number are inputted, model training is carried out, it, will be unknown after the completion of model training
For the coding input of the upstream firm data of trade classification enterprise into model, the result of output is the third industry of the enterprise
Prediction classification;Likewise, inputting the coding of down-stream enterprise's code data and the volume of trade classification number in the 4th data submodel
Code carries out model training, after the completion of model training, by the coding input of the upstream firm data of unknown trade classification enterprise to mould
In type, output result is the fourth line industry prediction classification of the enterprise.
In the present embodiment, can also by the data model of foundation store to data server or storage equipment etc., use
It can be called directly in the analysis of subsequent data, increase working efficiency.
S103, the industry prediction classification comprehensive analysis of multiple enterprises is obtained into the industry final classification of the enterprise.
In the present embodiment, using the second algorithm of setting, comprehensive point is carried out to the industry prediction classification of multiple enterprises
Analysis, obtains the industry final classification of the enterprise;By carrying out prediction classification to different data, and to multiple enterprises
The synthesis of industry prediction classification, so that classifying closer to the true trade classification of enterprise, so that the accuracy rate of classification is higher.
Specifically, the second algorithm is required to carry out machine learning, transfers sample data by study and obtains the row of enterprise
The process of industry Accurate classification can classify to multiple industry predictions and carry out comprehensive analysis, obtain industry final classification;The present embodiment
It is preferred that using Voting algorithm, the maximum classification results of select probability are as final prediction knot from the classification of multiple industry predictions
Fruit.
A kind of method for present embodiments providing enterprises ' industry classification, by a variety of basic datas of acquisition enterprise, and from
Middle selection data are with the corresponding data model of the data established with chosen;The industry of the enterprise is obtained according to the data model
Prediction classification;The industry prediction classification comprehensive analysis of multiple enterprises is obtained into the industry final classification of the enterprise;It can be with
Automatically classify for enterprises ' industry, greatly reduce the artificial workload for carrying out industry mark, and point for passing through intelligence
Class method improves the accuracy of trade classification.
Fig. 3 is the structural schematic diagram of enterprises ' industry sorter in the embodiment of the present application two.As shown in figure 3, comprising:
Modeling unit 301 for obtaining a variety of basic datas of enterprise, and therefrom chooses number of the data to establish and choose
According to corresponding data model.
Predict taxon 302, the industry prediction for obtaining the enterprise according to the data model is classified.
Analytical unit 303, for the industry prediction classification comprehensive analysis of multiple enterprises to be obtained the row of the enterprise
Industry final classification.
In one embodiment, it after choosing data, establishes before data model corresponding with the data chosen, modeling
Unit 301 is further used for pre-processing the data of the selection.
Further, in one embodiment, pretreatment is carried out to the data of the selection to comprise at least one of the following:
Word segmentation processing is carried out to data using participle tool;Delete the repeated data in the data of the selection;Smooth noise data.
In one embodiment, modeling unit 301 is further used for choosing a kind of number from a variety of basic datas
According to or a variety of data, vectorization processing is carried out to the data of the selection, corresponding number is established according to the first algorithm of setting
According to model.
In one embodiment, analytical unit 303 is further used for the second algorithm using setting, to multiple enterprises
The industry prediction classification comprehensive analysis of industry, obtains the industry final classification of the enterprise.
Further, in one embodiment, further include 304 (not shown) of unit, be used for from database
In extract the basic data of setting ratio, the type of the basic data and the type phase for establishing the data model at random
It is corresponding, by analyzing the data of extraction, the industry Accurate classification of enterprise is obtained, the data model is quasi- to the industry
Really classification is learnt.
Fig. 4 is the structural schematic diagram of enterprises ' industry categorizing system in the embodiment of the present application three.As shown in figure 4, comprising: number
According to server 401 and headend equipment 402.
The data server 401 is used to store a variety of basic datas of enterprise;
The headend equipment is configured with one or more 403 (not shown) of processor, and the processor 403 is used
In: a variety of basic datas of enterprise are obtained, and choose data therefrom with the corresponding data model of the data established with chosen;According to
The data model obtains the industry prediction classification of the enterprise;The industry prediction classification comprehensive analysis of multiple enterprises is obtained
To the industry final classification of the enterprise.
Processor 403 can be general processor, including central processing unit (Central Processing Unit, abbreviation
CPU), network processing unit (Network Processor, abbreviation NP) etc.;It can also be digital signal processor (DSP), dedicated
Integrated circuit (ASIC), ready-made programmable gate array (FPGA) either other programmable logic device, discrete gate or transistor
Logical device, discrete hardware components.It may be implemented or execute disclosed each method, step and the logic in the embodiment of the present application
Block diagram.General processor can be microprocessor or the processor is also possible to any conventional processor etc..
Particularly, according to an embodiment of the present application, it may be implemented as computer above with reference to the process of flow chart description
Software program.For example, embodiments herein includes a kind of computer program product comprising be carried on computer-readable medium
On computer program, which includes to be configured to the program code of method shown in execution flow chart.Such
In embodiment, which can be downloaded and installed from network by communications portion, and/or from detachable media quilt
Installation.When the computer program is executed by central processing unit (CPU), the above-mentioned function limited in the present processes is executed
Energy.
It can be write by one or more programming languages or combinations thereof in terms of the operation for being configured to execute the application
Calculation machine program code, described program design language include object oriented program language-such as Java, Smalltalk, C+
+, further include conventional procedural programming language-such as " C " language or similar programming language.Program code can be with
It fully executes, partly execute on the user computer on the user computer, being executed as an independent software package, portion
Divide and partially executes or executed on a remote computer or server completely on the remote computer on the user computer.?
It is related in the situation of remote computer, remote computer can pass through the network of any kind: including local area network (LAN) or wide area
Net (WAN)-be connected to subscriber computer, or, it may be connected to outer computer (such as using ISP come
It is connected by internet).
Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the application, method and computer journey
The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation
A part of one module, program segment or code of table, a part of the module, program segment or code are matched comprising one or more
It is set to the executable instruction of logic function as defined in realizing.There is specific precedence relationship in above-mentioned specific embodiment, but these are successively
Relationship is only exemplary, when specific implementation, these steps may less, more or execution sequence have adjustment.I.e.
In some implementations as replacements, function marked in the box can also be sent out in a different order than that indicated in the drawings
It is raw.For example, two boxes succeedingly indicated can actually be basically executed in parallel, they sometimes can also be by opposite suitable
Sequence executes, and this depends on the function involved.It is also noted that each box and block diagram in block diagram and or flow chart
And/or the combination of the box in flow chart, can with execute as defined in functions or operations dedicated hardware based system come
It realizes, or can realize using a combination of dedicated hardware and computer instructions.
Being described in unit involved in the embodiment of the present application can be realized by way of software, can also be by hard
The mode of part is realized.Described unit also can be set in the processor, for example, can be described as: a kind of processor packet
Modeling unit is included, for obtaining a variety of basic datas of enterprise, and it is corresponding with the data established with chosen therefrom to choose data
Data model;Predict taxon, the industry prediction for obtaining the enterprise according to the data model is classified;Analysis is single
Member, for the industry prediction classification comprehensive analysis of multiple enterprises to be obtained the industry final classification of the enterprise.
Wherein, the title of these units does not constitute the restriction to the unit itself under certain conditions, for example, modeling is single
Member is also described as " for obtaining a variety of basic datas of enterprise, and therefrom choosing data of the data to establish and choose
The unit of corresponding data model ".
It any is set according to what the headend equipment that the application discloses various embodiments can be including at least one processor
It is standby, and can include: camera, portable device, mobile terminal, communication terminal, mobile terminals, portable mobile termianl
Deng.For example, headend equipment may include following at least one: smart phone, tablet personal computer (PC), mobile phone, video
It is phone, e-book (e-book) reader, Desktop PC, above-knee PC, netbook computer, personal digital assistant (PDA), portable
Multimedia player (PMP), MP3 player, ambulatory medical device, camera and wearable device are (for example, such as electronic eyes
The headset equipment (HMD) of mirror, Electronic Clothes, electronics bracelet, electronics necklace, electronic components, electronics is tatooed or smartwatch).
It can be one or more combinations of above-mentioned various equipment according to the headend equipment that the application discloses various embodiments.According to
The headend equipment of some embodiments of the disclosure can be flexible apparatus.In addition, before according to the application disclosed embodiment
End equipment is not limited to above equipment, and may include the new headend equipment developed according to technology.
In addition, in above-described embodiment, modeling unit, prediction taxon, analytical unit, and may be respectively referred to as the first journey
Sequence unit, the second program unit, third program unit.
The statement used in the various embodiments of the application " first ", " second ", " first " or " described
Two " can modify various parts and unrelated with sequence and/or importance, but these statements do not limit corresponding component.The above statement
It is only used for the purpose for distinguishing element and other elements.For example, the first user equipment and second user equipment indicate different
User equipment, although being both user equipment.For example, first element can claim under the premise of without departing substantially from the scope of the present disclosure
Make second element, similarly, second element can be referred to as first element.
Although having been described that the application's is preferred, once a person skilled in the art knows basic creative general
It reads, then can these be made with other change and modification.So it includes preferably and falling into that the following claims are intended to be interpreted as
All change and modification of the application range.Obviously, those skilled in the art can carry out various changes and change to the application
Type is without departing from spirit and scope.If being wanted in this way, these modifications and variations of the application belong to the application right
Ask and its equivalent technologies within the scope of, then the application is also intended to include these modifications and variations.
Claims (10)
1. a kind of enterprises ' industry classification method characterized by comprising
A variety of basic datas of enterprise are obtained, and choose data therefrom with the corresponding data model of the data established with chosen;
The industry prediction classification of the enterprise is obtained according to the data model;
The industry prediction classification comprehensive analysis of multiple enterprises is obtained into the industry final classification of the enterprise.
2. the method according to claim 1, wherein being established corresponding with the data chosen after choosing data
It further include that the data of the selection are pre-processed before data model.
3. according to the method described in claim 2, it is characterized in that, to the data of the selection carry out pretreatment operation include with
Lower at least one: word segmentation processing is carried out to data using participle tool;Delete the repeated data in the data of the selection;Smoothly
Noise data.
4. the method according to claim 1, wherein a variety of basic datas include the enterprise name number
According to, at least one of business scope data, main products data, upstream firm code data and down-stream enterprise's code data.
5. the method according to claim 1, wherein described therefrom choose data pair of the data to establish and choose
The data model answered, including, a kind of data or a variety of data are chosen from a variety of basic datas, to the number of the selection
Corresponding data model is established according to the first algorithm of setting according to vectorization processing is carried out.
6. according to the method described in claim 5, it is characterized in that, the first algorithm of the setting includes convolutional neural networks side
Method, at least one of MLPC method and Logistic homing method.
7. the synthesis the method according to claim 1, wherein the industry prediction by multiple enterprises is classified
The industry final classification that analysis obtains the enterprise includes, pre- to the industry of multiple enterprises using the second algorithm of setting
Classification comprehensive analysis is surveyed, the industry final classification of the enterprise is obtained.
8. the method according to claim 1, wherein described obtain the row of the enterprise according to the data model
It further include extracting the basic data of setting ratio at random from database before industry prediction classification, the basic data
Type is corresponding with the type for establishing the data model, and by analyzing the data of extraction, the industry for obtaining enterprise is quasi-
Really classification, the data model learn the industry Accurate classification.
9. a kind of enterprises ' industry sorter characterized by comprising
Modeling unit, for obtaining a variety of basic datas of enterprise, and it is corresponding with the data established with chosen therefrom to choose data
Data model;
Predict taxon, the industry prediction for obtaining the enterprise according to the data model is classified;
Analytical unit, the industry for the industry prediction classification comprehensive analysis of multiple enterprises to be obtained the enterprise are finally divided
Class.
10. a kind of enterprises ' industry categorizing system characterized by comprising data server and headend equipment, the data service
Device is used to store a variety of basic datas of enterprise;The headend equipment is configured with one or more processor, the processor
For: a variety of basic datas of enterprise are obtained, and choose data therefrom with the corresponding data model of the data established with chosen;Root
The industry prediction classification of the enterprise is obtained according to the data model;By the industry prediction classification comprehensive analysis of multiple enterprises
Obtain the industry final classification of the enterprise.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811237531.7A CN110245226A (en) | 2018-10-23 | 2018-10-23 | Enterprises ' industry classification method and its device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811237531.7A CN110245226A (en) | 2018-10-23 | 2018-10-23 | Enterprises ' industry classification method and its device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110245226A true CN110245226A (en) | 2019-09-17 |
Family
ID=67882386
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811237531.7A Pending CN110245226A (en) | 2018-10-23 | 2018-10-23 | Enterprises ' industry classification method and its device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110245226A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110827092A (en) * | 2019-11-13 | 2020-02-21 | 广州点动信息科技股份有限公司 | Business information analysis and statistics method and system based on cloud platform |
CN110929124A (en) * | 2019-11-07 | 2020-03-27 | 上海融贷通金融信息服务有限公司 | Enterprise information recommendation method and system based on natural language |
CN111209397A (en) * | 2019-12-30 | 2020-05-29 | 中伯伦(北京)信息技术有限公司 | Method for determining enterprise industry category |
CN113591979A (en) * | 2021-07-30 | 2021-11-02 | 深圳前海微众银行股份有限公司 | Industry category identification method, equipment, medium and computer program product |
CN113591979B (en) * | 2021-07-30 | 2024-11-08 | 深圳前海微众银行股份有限公司 | Industry category identification method, equipment, medium and computer program product |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020091557A1 (en) * | 2001-01-08 | 2002-07-11 | Srinivas Akkaraju | Method for facilitating transactions of life sciences opportunities |
CN103336796A (en) * | 2013-06-09 | 2013-10-02 | 北京百度网讯科技有限公司 | Method and system for displaying door buster directly |
CN106779467A (en) * | 2016-12-31 | 2017-05-31 | 成都数联铭品科技有限公司 | Enterprises ' industry categorizing system based on automatic information screening |
CN106777335A (en) * | 2017-01-13 | 2017-05-31 | 深圳爱拼信息科技有限公司 | It is a kind of to be remembered based on shot and long term(LSTM)The multi-tag trade classification method and device of model |
CN107169036A (en) * | 2017-04-19 | 2017-09-15 | 畅捷通信息技术股份有限公司 | Determine the method and system of the affiliated category of employment of enterprise |
-
2018
- 2018-10-23 CN CN201811237531.7A patent/CN110245226A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020091557A1 (en) * | 2001-01-08 | 2002-07-11 | Srinivas Akkaraju | Method for facilitating transactions of life sciences opportunities |
CN103336796A (en) * | 2013-06-09 | 2013-10-02 | 北京百度网讯科技有限公司 | Method and system for displaying door buster directly |
CN106779467A (en) * | 2016-12-31 | 2017-05-31 | 成都数联铭品科技有限公司 | Enterprises ' industry categorizing system based on automatic information screening |
CN106777335A (en) * | 2017-01-13 | 2017-05-31 | 深圳爱拼信息科技有限公司 | It is a kind of to be remembered based on shot and long term(LSTM)The multi-tag trade classification method and device of model |
CN107169036A (en) * | 2017-04-19 | 2017-09-15 | 畅捷通信息技术股份有限公司 | Determine the method and system of the affiliated category of employment of enterprise |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110929124A (en) * | 2019-11-07 | 2020-03-27 | 上海融贷通金融信息服务有限公司 | Enterprise information recommendation method and system based on natural language |
CN110827092A (en) * | 2019-11-13 | 2020-02-21 | 广州点动信息科技股份有限公司 | Business information analysis and statistics method and system based on cloud platform |
CN111209397A (en) * | 2019-12-30 | 2020-05-29 | 中伯伦(北京)信息技术有限公司 | Method for determining enterprise industry category |
CN111209397B (en) * | 2019-12-30 | 2020-09-08 | 中伯伦(北京)信息技术有限公司 | Method for determining enterprise industry category |
CN113591979A (en) * | 2021-07-30 | 2021-11-02 | 深圳前海微众银行股份有限公司 | Industry category identification method, equipment, medium and computer program product |
CN113591979B (en) * | 2021-07-30 | 2024-11-08 | 深圳前海微众银行股份有限公司 | Industry category identification method, equipment, medium and computer program product |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020228376A1 (en) | Text processing method and model training method and apparatus | |
US12039447B2 (en) | Information processing method and terminal, and computer storage medium | |
CN111368042A (en) | Intelligent question and answer method and device, computer equipment and computer storage medium | |
CN111523324B (en) | Named entity recognition model training method and device | |
CN108874921A (en) | Extract method, apparatus, terminal device and the storage medium of text feature word | |
CN110188195A (en) | A kind of text intension recognizing method, device and equipment based on deep learning | |
CN110598869B (en) | Classification method and device based on sequence model and electronic equipment | |
CN112580328A (en) | Event information extraction method and device, storage medium and electronic equipment | |
CN110245226A (en) | Enterprises ' industry classification method and its device | |
CN113010678B (en) | Training method of classification model, text classification method and device | |
CN110245228A (en) | The method and apparatus for determining text categories | |
CN110717009A (en) | Method and equipment for generating legal consultation report | |
CN116010581A (en) | Knowledge graph question-answering method and system based on power grid hidden trouble shooting scene | |
Aziguli et al. | A robust text classifier based on denoising deep neural network in the analysis of big data | |
Nguyen et al. | An ensemble of shallow and deep learning algorithms for Vietnamese sentiment analysis | |
CN111061876B (en) | Event public opinion data analysis method and device | |
CN115168537A (en) | Training method and device of semantic retrieval model, electronic equipment and storage medium | |
CN104699819A (en) | Sememe classification method and device | |
CN110232328A (en) | A kind of reference report analytic method, device and computer readable storage medium | |
CN116881462A (en) | Text data processing, text representation and text clustering method and equipment | |
CN111625858A (en) | Intelligent multi-mode data desensitization method and device in vertical field | |
CN116450827A (en) | Event template induction method and system based on large-scale language model | |
CN111274382A (en) | Text classification method, device, equipment and storage medium | |
US20200110996A1 (en) | Machine learning of keywords | |
CN114818644B (en) | Text template generation method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190917 |