CN110245226A - Enterprises ' industry classification method and its device - Google Patents

Enterprises ' industry classification method and its device Download PDF

Info

Publication number
CN110245226A
CN110245226A CN201811237531.7A CN201811237531A CN110245226A CN 110245226 A CN110245226 A CN 110245226A CN 201811237531 A CN201811237531 A CN 201811237531A CN 110245226 A CN110245226 A CN 110245226A
Authority
CN
China
Prior art keywords
data
industry
enterprise
classification
enterprises
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811237531.7A
Other languages
Chinese (zh)
Inventor
金瑞峰
韦东杰
苏斌
苗璐
林凉
郭向国
戴才良
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Love Letter And Letter Co Ltd
Aisino Corp
Original Assignee
Love Letter And Letter Co Ltd
Aisino Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Love Letter And Letter Co Ltd, Aisino Corp filed Critical Love Letter And Letter Co Ltd
Priority to CN201811237531.7A priority Critical patent/CN110245226A/en
Publication of CN110245226A publication Critical patent/CN110245226A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the present application provides a kind of enterprises ' industry classification method and its device, and wherein method includes obtaining a variety of basic datas of enterprise, and choose data therefrom with the corresponding data model of the data established with chosen;The industry prediction classification of the enterprise is obtained according to the data model;The industry prediction classification comprehensive analysis of multiple enterprises is obtained into the industry final classification of the enterprise;It can classify automatically for enterprises ' industry, greatly reduce the artificial workload for carrying out industry mark, and improve the accuracy of trade classification by the classification method of intelligence.

Description

Enterprises ' industry classification method and its device
Technical field
This application involves the method and device thereof that Internet technical field more particularly to a kind of enterprises ' industry are classified.
Background technique
Recently as the prosperity of China market economy, important component of enterprise's reference as social credit system, It is played an important role in terms of improving market system, Maintenance Market order, promoting.Based on reference business Needs, it would be desirable to accurately classified to the industry of enterprise.
Enterprises ' industry classification in tax registration information is that company sets at the beginning of the tax registration, and human error compares Greatly, and with enterprise true trade classification there are larger differences, so needing again accurately to classify to the industry of enterprise.Separately A variety of basic datas of the enterprise including tax big data and enterprise operation business etc. have very big break-up value outside, so It can accurately be classified according to industry of these basic datas to enterprise when the industry to enterprise is classified.
A variety of basic datas based on enterprise in the prior art, such as the management functions and the data such as tax big data of enterprise When classifying to the industry of enterprise, master to be used is still traditional method, using artificial statistical analysis to the row of enterprise Industry is classified, and needs to consume a large amount of manpower and material resources, and workload is heavy, and working efficiency is low;In addition, in the prior art to enterprise Industry when being classified, it is basic only unilaterally to refer to a kind of data, for example, according to enterprise name data to the industry of enterprise into Row classification, so that classification results accuracy is lower, confidence level is not high.Therefore, need at present it is a kind of by big data analysis and The method that the method for artificial intelligence carries out efficiently and accurately intelligent classification to the industry of enterprise.
Summary of the invention
In view of this, the main purpose of the application is to provide a kind of enterprises ' industry classification method and its device, to solve The technical problems existing in the prior art.
In a first aspect, the embodiment of the present application provides a kind of enterprises ' industry classification method, comprising:
A variety of basic datas of enterprise are obtained, and choose data therefrom with the corresponding data mould of the data established with chosen Type;
The industry prediction classification of the enterprise is obtained according to the data model;
The industry prediction classification comprehensive analysis of multiple enterprises is obtained into the industry final classification of the enterprise.
Optionally, in one embodiment of the application, after choosing data, data model corresponding with the data chosen is established It before, further include that the data of the selection are pre-processed.
Optionally, in one embodiment of the application, carrying out pretreatment to the data of the selection includes in aftermentioned processing At least one carries out word segmentation processing to data using participle tool, deletes the repeated data in the data of the selection, smoothly make an uproar Sound data.
Optionally, in one embodiment of the application, a variety of basic datas include the enterprise name data, manage model Enclose at least one of data, main products data, upstream firm code data and down-stream enterprise's code data.
Optionally, in one embodiment of the application, the data of therefrom choosing are with the corresponding number of the data established with chosen According to model, including, a kind of data or a variety of data are chosen from a variety of basic datas, and the data of the selection are carried out Vectorization processing, according to the first algorithm of setting, establishes corresponding data model.
Optionally, in one embodiment of the application, the first algorithm of the setting includes convolutional neural networks method, MLPC At least one of method and Logistic homing method.
Optionally, described to obtain the industry prediction classification comprehensive analysis of multiple enterprises in one embodiment of the application Industry final classification to the enterprise includes, using the second algorithm of setting, classifying to the industry prediction of multiple enterprises Comprehensive analysis obtains the industry final classification of the enterprise.
Optionally, described that the industry prediction of the enterprise is obtained according to the data model in one embodiment of the application Before classification, further include extracting the basic data of setting ratio at random from database, the type of the basic data with The type for establishing the data model is corresponding, by analyzing the data of extraction, obtains the industry Accurate classification of enterprise, The data model learns the industry Accurate classification.
Second aspect, this application provides a kind of enterprises ' industry sorters, comprising:
Modeling unit for obtaining a variety of basic datas of enterprise, and therefrom chooses data of the data to establish and choose Corresponding data model;
Predict taxon, the industry prediction for obtaining the enterprise according to the data model is classified;
Analytical unit, for the industry prediction classification comprehensive analysis of multiple enterprises to be obtained the industry of the enterprise most Classification eventually.
The third aspect, this application provides a kind of enterprises ' industry categorizing systems characterized by comprising data server And headend equipment, the data server are used to store a variety of basic datas of enterprise;The headend equipment configured with one or The multiple processors of person, the processor are used for: being obtained a variety of basic datas of enterprise, and chosen data therefrom to establish and choose The corresponding data model of data;The industry prediction classification of the enterprise is obtained according to the data model;By multiple enterprises The industry prediction classification comprehensive analysis of industry obtains the industry final classification of the enterprise.
To sum up, in above-mentioned technical proposal provided by the embodiments of the present application, by a variety of basic datas of acquisition enterprise, and from Middle selection data are with the corresponding data model of the data established with chosen;The industry of the enterprise is obtained according to the data model Prediction classification;The industry prediction classification comprehensive analysis of multiple enterprises is obtained into the industry final classification of the enterprise;It can be with Automatically classify for enterprises ' industry, greatly reduce the artificial workload for carrying out industry mark, and point for passing through intelligence Class method improves the accuracy of trade classification.
Detailed description of the invention
In order to illustrate more clearly of the application or technical solution in the prior art, below by use required in embodiment Attached drawing be briefly described, it should be apparent that, the accompanying drawings in the following description be only it is more as described in this application, for this For the those of ordinary skill of field, it is also possible to obtain other drawings based on these drawings.
Fig. 1 is the flow diagram of enterprises ' industry classification method in the embodiment of the present application one.
Fig. 2 is the specific implementation method flow chart of step S101 in the embodiment of the present application one.
Fig. 3 is the structural schematic diagram of enterprises ' industry sorter in the embodiment of the present application two.
Fig. 4 is the structural schematic diagram of enterprises ' industry categorizing system in the embodiment of the present application three.
Specific embodiment
Any technical solution for implementing the embodiment of the present application must be not necessarily required to reach simultaneously above all advantages.
In order to make those skilled in the art more fully understand the technical solution in the embodiment of the present application, below in conjunction with the application Attached drawing in embodiment, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described reality Applying example only is the embodiment of the present application a part of the embodiment, instead of all the embodiments.Based on the implementation in the embodiment of the present application The range of the embodiment of the present application protection all should belong in example, those of ordinary skill in the art's every other embodiment obtained.
Further illustrate that the embodiment of the present application implements below with reference to the embodiment of the present application attached drawing.
Fig. 1 is the flow diagram of enterprises ' industry classification method in the embodiment of the present application one.As shown in Figure 1 comprising such as Lower step:
S101, a variety of basic datas for obtaining enterprise, and choose data therefrom with the corresponding number of the data established with chosen According to model.
In the present embodiment, choose data when can choose it is a kind of establish a kind of corresponding data model, also can choose more Kind data establish corresponding another data model.
In the present embodiment, a variety of basic datas include the enterprise name data, business scope data, main products At least one of data, upstream firm code data and down-stream enterprise's code data can also include the contact of the enterprise The data etc. of invoice.Wherein enterprise name data, business scope data and these three data of main products data are text data, And upstream firm code data and down-stream enterprise's code data are non-text data, represent the type of business of upstream and downstream firms.
Fig. 2 is the specific implementation method flow chart of step S101 in the embodiment of the present application one.As shown in Figure 2 comprising such as Lower step:
S111, a variety of basic datas for obtaining enterprise, and therefrom choose data.
In the present embodiment, a variety of basic datas for obtaining enterprise can directly acquire required number from by plug-in unit from network According to, such as data can be grabbed from webpage using reptile instrument, naturally it is also possible to use other methods;It in addition can also be institute Storage is into background data server or other storage equipment after stating basic data acquisition, then to the basis of acquisition Data carry out classification processing, when obtaining data, can obtain institute from background data server or in other storage equipment Need the data buffer storage of type into memory;Certainly data can also be obtained by other methods.
S112, the data of the selection are pre-processed.
In the present embodiment, pretreatment is carried out to data and is comprised at least one of the following:
Word segmentation processing is carried out to data using participle tool;By the way that research shows that characteristic particle size is word granularity, it is representative remote It is better than word granularity, so carrying out word segmentation processing to text data, preferably participle tool takes Jieba to segment tool.Wherein such as The data that fruit is chosen are main management commodity data, due to itself being just word, so not having to carry out word segmentation processing to it.
Delete the repeated data in the data of the selection;If the inside might have repetition in the basic data chosen Data avoid it from having an impact the industry prediction classification of enterprise so to carry out delete processing to repeated data.
Smooth noise data;By smooth noise data, missing data and the abnormal data etc. in the data of selection are removed, The quality for improving data improves the accuracy of the industry prediction classification of enterprise.
S113, a kind of data or a variety of data are chosen from a variety of basic datas, to the data of the selection into Row vectorization processing establishes data model corresponding with the data chosen according to the first algorithm of setting.
In the present embodiment, carrying out vectorization processing to the data of the selection includes, by data be converted into feature vector or Person carries out coding to data and obtains feature vector, certainly further includes that other methods carry out vectorization processing.
Specifically, if the data chosen are enterprise name data, the text datas such as business scope data have passed through pre- place Reason will also carry out stop words and handle, and stop words is pronoun conjunction preposition of some high frequencies in text data etc. to data vectorization Meaningless word preferably sets up a deactivated vocabulary, data and deactivated vocabulary is compared, and deletes deactivating in data Word.
Specifically, the feature vector of text data is extracted, such as its corresponding Feature Words is extracted according to enterprise name, will be looked forward to The text data of industry title is expressed as a vector of vector space, and different Feature Words are a dimension of vector space, The value of each dimension is the weight of corresponding characteristic item in the text, and calculates feature weight point to Feature Words, according to point It is several that all Feature Words are ranked up, it selects score highest as most important Feature Words, filters remaining Feature Words, obtain Final feature vector.
It specifically, itself is exactly word, it is advantageous to compile to it if the data chosen are main management commodity data Code obtains feature vector, preferably one-hot is taken to encode, can guarantee that the single feature in each sample only has 1 to be in State 1, others are all 0.Code data for upstream and downstream firms is also, it is preferred that obtain feature vector using coded treatment.
Further, in one embodiment, the first algorithm includes convolutional neural networks method, MLPC method and At least one of Logistic homing method.
Specifically, enterprise name data after two kinds of data of business scope data carry out vectorization processing, are put it into artificial Neural network model carries out modeling work, obtains the first data submodel.It is preferable to use convolutional neural networks methods to establish to it Data model.Since the feature sizes that various sizes of convolution kernel obtains are different, their dimension is made using pond function It is identical, it is preferable to use maximum value pond, extracts the maximum value in each convolution kernel, obtained after cascade final text feature to Amount.
Specifically, after main products data are encoded, directly pass through convolutional Neural net with enterprise name and business scope The Text eigenvector that network is handled is overlapped, and obtains the second data submodel, then the neural network by connecting entirely It is trained and predicts, so as to obtain the industry prediction classification of more accurate enterprise.
Specifically, the present embodiment further includes encoding to trade classification, for according to upstream firm code data The third data submodel of foundation and according to down-stream enterprise's code data establish the 4th data submodel model obtain enterprise Industry prediction classification.
Specifically, upstream firm code data establishes data model it is preferable to use Logistic homing method, obtains third Data submodel, down-stream enterprise's code data establish data model it is preferable to use MLPC method, obtain the 4th data submodel, It is of course also possible to which other methods is selected to establish data model.
Further, in one embodiment, the basic data of setting ratio, institute are extracted at random from database The type for stating basic data is corresponding with the type for establishing the data model, by analyzing the data of extraction, obtains The industry Accurate classification of enterprise, the data model learn the industry Accurate classification.
Specifically, the basic data that all enterprises are store in database, can store in data server, can also be with It is stored in storage equipment;The ratio of setting can be set according to the data volume in database, make the industry for obtaining enterprise Accurate classification is representative.
Specifically, data model carries out the industry Accurate classification to rely on powerful parallel of Spark in learning process Memory processing technique realizes quickly dynamically model training.Data are much lagged behind this method solve model modification to update The problem of, it significantly improves the validity of model training result.
S102, the industry prediction classification that the enterprise is obtained according to the data model.
In the present embodiment, input is enterprise name data, business scope data, main products number in the second data submodel According to coding and trade classification coding, it is trained into model, after the completion of model training, by the data of unknown trade classification enterprise It is input in model, the result of output is the trade classification of the second prediction of the enterprise.Wherein in third data submodel The coding of upstream firm data and the coding of trade classification number are inputted, model training is carried out, it, will be unknown after the completion of model training For the coding input of the upstream firm data of trade classification enterprise into model, the result of output is the third industry of the enterprise Prediction classification;Likewise, inputting the coding of down-stream enterprise's code data and the volume of trade classification number in the 4th data submodel Code carries out model training, after the completion of model training, by the coding input of the upstream firm data of unknown trade classification enterprise to mould In type, output result is the fourth line industry prediction classification of the enterprise.
In the present embodiment, can also by the data model of foundation store to data server or storage equipment etc., use It can be called directly in the analysis of subsequent data, increase working efficiency.
S103, the industry prediction classification comprehensive analysis of multiple enterprises is obtained into the industry final classification of the enterprise.
In the present embodiment, using the second algorithm of setting, comprehensive point is carried out to the industry prediction classification of multiple enterprises Analysis, obtains the industry final classification of the enterprise;By carrying out prediction classification to different data, and to multiple enterprises The synthesis of industry prediction classification, so that classifying closer to the true trade classification of enterprise, so that the accuracy rate of classification is higher.
Specifically, the second algorithm is required to carry out machine learning, transfers sample data by study and obtains the row of enterprise The process of industry Accurate classification can classify to multiple industry predictions and carry out comprehensive analysis, obtain industry final classification;The present embodiment It is preferred that using Voting algorithm, the maximum classification results of select probability are as final prediction knot from the classification of multiple industry predictions Fruit.
A kind of method for present embodiments providing enterprises ' industry classification, by a variety of basic datas of acquisition enterprise, and from Middle selection data are with the corresponding data model of the data established with chosen;The industry of the enterprise is obtained according to the data model Prediction classification;The industry prediction classification comprehensive analysis of multiple enterprises is obtained into the industry final classification of the enterprise;It can be with Automatically classify for enterprises ' industry, greatly reduce the artificial workload for carrying out industry mark, and point for passing through intelligence Class method improves the accuracy of trade classification.
Fig. 3 is the structural schematic diagram of enterprises ' industry sorter in the embodiment of the present application two.As shown in figure 3, comprising:
Modeling unit 301 for obtaining a variety of basic datas of enterprise, and therefrom chooses number of the data to establish and choose According to corresponding data model.
Predict taxon 302, the industry prediction for obtaining the enterprise according to the data model is classified.
Analytical unit 303, for the industry prediction classification comprehensive analysis of multiple enterprises to be obtained the row of the enterprise Industry final classification.
In one embodiment, it after choosing data, establishes before data model corresponding with the data chosen, modeling Unit 301 is further used for pre-processing the data of the selection.
Further, in one embodiment, pretreatment is carried out to the data of the selection to comprise at least one of the following: Word segmentation processing is carried out to data using participle tool;Delete the repeated data in the data of the selection;Smooth noise data.
In one embodiment, modeling unit 301 is further used for choosing a kind of number from a variety of basic datas According to or a variety of data, vectorization processing is carried out to the data of the selection, corresponding number is established according to the first algorithm of setting According to model.
In one embodiment, analytical unit 303 is further used for the second algorithm using setting, to multiple enterprises The industry prediction classification comprehensive analysis of industry, obtains the industry final classification of the enterprise.
Further, in one embodiment, further include 304 (not shown) of unit, be used for from database In extract the basic data of setting ratio, the type of the basic data and the type phase for establishing the data model at random It is corresponding, by analyzing the data of extraction, the industry Accurate classification of enterprise is obtained, the data model is quasi- to the industry Really classification is learnt.
Fig. 4 is the structural schematic diagram of enterprises ' industry categorizing system in the embodiment of the present application three.As shown in figure 4, comprising: number According to server 401 and headend equipment 402.
The data server 401 is used to store a variety of basic datas of enterprise;
The headend equipment is configured with one or more 403 (not shown) of processor, and the processor 403 is used In: a variety of basic datas of enterprise are obtained, and choose data therefrom with the corresponding data model of the data established with chosen;According to The data model obtains the industry prediction classification of the enterprise;The industry prediction classification comprehensive analysis of multiple enterprises is obtained To the industry final classification of the enterprise.
Processor 403 can be general processor, including central processing unit (Central Processing Unit, abbreviation CPU), network processing unit (Network Processor, abbreviation NP) etc.;It can also be digital signal processor (DSP), dedicated Integrated circuit (ASIC), ready-made programmable gate array (FPGA) either other programmable logic device, discrete gate or transistor Logical device, discrete hardware components.It may be implemented or execute disclosed each method, step and the logic in the embodiment of the present application Block diagram.General processor can be microprocessor or the processor is also possible to any conventional processor etc..
Particularly, according to an embodiment of the present application, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiments herein includes a kind of computer program product comprising be carried on computer-readable medium On computer program, which includes to be configured to the program code of method shown in execution flow chart.Such In embodiment, which can be downloaded and installed from network by communications portion, and/or from detachable media quilt Installation.When the computer program is executed by central processing unit (CPU), the above-mentioned function limited in the present processes is executed Energy.
It can be write by one or more programming languages or combinations thereof in terms of the operation for being configured to execute the application Calculation machine program code, described program design language include object oriented program language-such as Java, Smalltalk, C+ +, further include conventional procedural programming language-such as " C " language or similar programming language.Program code can be with It fully executes, partly execute on the user computer on the user computer, being executed as an independent software package, portion Divide and partially executes or executed on a remote computer or server completely on the remote computer on the user computer.? It is related in the situation of remote computer, remote computer can pass through the network of any kind: including local area network (LAN) or wide area Net (WAN)-be connected to subscriber computer, or, it may be connected to outer computer (such as using ISP come It is connected by internet).
Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the application, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of the module, program segment or code are matched comprising one or more It is set to the executable instruction of logic function as defined in realizing.There is specific precedence relationship in above-mentioned specific embodiment, but these are successively Relationship is only exemplary, when specific implementation, these steps may less, more or execution sequence have adjustment.I.e. In some implementations as replacements, function marked in the box can also be sent out in a different order than that indicated in the drawings It is raw.For example, two boxes succeedingly indicated can actually be basically executed in parallel, they sometimes can also be by opposite suitable Sequence executes, and this depends on the function involved.It is also noted that each box and block diagram in block diagram and or flow chart And/or the combination of the box in flow chart, can with execute as defined in functions or operations dedicated hardware based system come It realizes, or can realize using a combination of dedicated hardware and computer instructions.
Being described in unit involved in the embodiment of the present application can be realized by way of software, can also be by hard The mode of part is realized.Described unit also can be set in the processor, for example, can be described as: a kind of processor packet Modeling unit is included, for obtaining a variety of basic datas of enterprise, and it is corresponding with the data established with chosen therefrom to choose data Data model;Predict taxon, the industry prediction for obtaining the enterprise according to the data model is classified;Analysis is single Member, for the industry prediction classification comprehensive analysis of multiple enterprises to be obtained the industry final classification of the enterprise.
Wherein, the title of these units does not constitute the restriction to the unit itself under certain conditions, for example, modeling is single Member is also described as " for obtaining a variety of basic datas of enterprise, and therefrom choosing data of the data to establish and choose The unit of corresponding data model ".
It any is set according to what the headend equipment that the application discloses various embodiments can be including at least one processor It is standby, and can include: camera, portable device, mobile terminal, communication terminal, mobile terminals, portable mobile termianl Deng.For example, headend equipment may include following at least one: smart phone, tablet personal computer (PC), mobile phone, video It is phone, e-book (e-book) reader, Desktop PC, above-knee PC, netbook computer, personal digital assistant (PDA), portable Multimedia player (PMP), MP3 player, ambulatory medical device, camera and wearable device are (for example, such as electronic eyes The headset equipment (HMD) of mirror, Electronic Clothes, electronics bracelet, electronics necklace, electronic components, electronics is tatooed or smartwatch). It can be one or more combinations of above-mentioned various equipment according to the headend equipment that the application discloses various embodiments.According to The headend equipment of some embodiments of the disclosure can be flexible apparatus.In addition, before according to the application disclosed embodiment End equipment is not limited to above equipment, and may include the new headend equipment developed according to technology.
In addition, in above-described embodiment, modeling unit, prediction taxon, analytical unit, and may be respectively referred to as the first journey Sequence unit, the second program unit, third program unit.
The statement used in the various embodiments of the application " first ", " second ", " first " or " described Two " can modify various parts and unrelated with sequence and/or importance, but these statements do not limit corresponding component.The above statement It is only used for the purpose for distinguishing element and other elements.For example, the first user equipment and second user equipment indicate different User equipment, although being both user equipment.For example, first element can claim under the premise of without departing substantially from the scope of the present disclosure Make second element, similarly, second element can be referred to as first element.
Although having been described that the application's is preferred, once a person skilled in the art knows basic creative general It reads, then can these be made with other change and modification.So it includes preferably and falling into that the following claims are intended to be interpreted as All change and modification of the application range.Obviously, those skilled in the art can carry out various changes and change to the application Type is without departing from spirit and scope.If being wanted in this way, these modifications and variations of the application belong to the application right Ask and its equivalent technologies within the scope of, then the application is also intended to include these modifications and variations.

Claims (10)

1. a kind of enterprises ' industry classification method characterized by comprising
A variety of basic datas of enterprise are obtained, and choose data therefrom with the corresponding data model of the data established with chosen;
The industry prediction classification of the enterprise is obtained according to the data model;
The industry prediction classification comprehensive analysis of multiple enterprises is obtained into the industry final classification of the enterprise.
2. the method according to claim 1, wherein being established corresponding with the data chosen after choosing data It further include that the data of the selection are pre-processed before data model.
3. according to the method described in claim 2, it is characterized in that, to the data of the selection carry out pretreatment operation include with Lower at least one: word segmentation processing is carried out to data using participle tool;Delete the repeated data in the data of the selection;Smoothly Noise data.
4. the method according to claim 1, wherein a variety of basic datas include the enterprise name number According to, at least one of business scope data, main products data, upstream firm code data and down-stream enterprise's code data.
5. the method according to claim 1, wherein described therefrom choose data pair of the data to establish and choose The data model answered, including, a kind of data or a variety of data are chosen from a variety of basic datas, to the number of the selection Corresponding data model is established according to the first algorithm of setting according to vectorization processing is carried out.
6. according to the method described in claim 5, it is characterized in that, the first algorithm of the setting includes convolutional neural networks side Method, at least one of MLPC method and Logistic homing method.
7. the synthesis the method according to claim 1, wherein the industry prediction by multiple enterprises is classified The industry final classification that analysis obtains the enterprise includes, pre- to the industry of multiple enterprises using the second algorithm of setting Classification comprehensive analysis is surveyed, the industry final classification of the enterprise is obtained.
8. the method according to claim 1, wherein described obtain the row of the enterprise according to the data model It further include extracting the basic data of setting ratio at random from database before industry prediction classification, the basic data Type is corresponding with the type for establishing the data model, and by analyzing the data of extraction, the industry for obtaining enterprise is quasi- Really classification, the data model learn the industry Accurate classification.
9. a kind of enterprises ' industry sorter characterized by comprising
Modeling unit, for obtaining a variety of basic datas of enterprise, and it is corresponding with the data established with chosen therefrom to choose data Data model;
Predict taxon, the industry prediction for obtaining the enterprise according to the data model is classified;
Analytical unit, the industry for the industry prediction classification comprehensive analysis of multiple enterprises to be obtained the enterprise are finally divided Class.
10. a kind of enterprises ' industry categorizing system characterized by comprising data server and headend equipment, the data service Device is used to store a variety of basic datas of enterprise;The headend equipment is configured with one or more processor, the processor For: a variety of basic datas of enterprise are obtained, and choose data therefrom with the corresponding data model of the data established with chosen;Root The industry prediction classification of the enterprise is obtained according to the data model;By the industry prediction classification comprehensive analysis of multiple enterprises Obtain the industry final classification of the enterprise.
CN201811237531.7A 2018-10-23 2018-10-23 Enterprises ' industry classification method and its device Pending CN110245226A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811237531.7A CN110245226A (en) 2018-10-23 2018-10-23 Enterprises ' industry classification method and its device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811237531.7A CN110245226A (en) 2018-10-23 2018-10-23 Enterprises ' industry classification method and its device

Publications (1)

Publication Number Publication Date
CN110245226A true CN110245226A (en) 2019-09-17

Family

ID=67882386

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811237531.7A Pending CN110245226A (en) 2018-10-23 2018-10-23 Enterprises ' industry classification method and its device

Country Status (1)

Country Link
CN (1) CN110245226A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110827092A (en) * 2019-11-13 2020-02-21 广州点动信息科技股份有限公司 Business information analysis and statistics method and system based on cloud platform
CN110929124A (en) * 2019-11-07 2020-03-27 上海融贷通金融信息服务有限公司 Enterprise information recommendation method and system based on natural language
CN111209397A (en) * 2019-12-30 2020-05-29 中伯伦(北京)信息技术有限公司 Method for determining enterprise industry category
CN113591979A (en) * 2021-07-30 2021-11-02 深圳前海微众银行股份有限公司 Industry category identification method, equipment, medium and computer program product
CN113591979B (en) * 2021-07-30 2024-11-08 深圳前海微众银行股份有限公司 Industry category identification method, equipment, medium and computer program product

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020091557A1 (en) * 2001-01-08 2002-07-11 Srinivas Akkaraju Method for facilitating transactions of life sciences opportunities
CN103336796A (en) * 2013-06-09 2013-10-02 北京百度网讯科技有限公司 Method and system for displaying door buster directly
CN106779467A (en) * 2016-12-31 2017-05-31 成都数联铭品科技有限公司 Enterprises ' industry categorizing system based on automatic information screening
CN106777335A (en) * 2017-01-13 2017-05-31 深圳爱拼信息科技有限公司 It is a kind of to be remembered based on shot and long term(LSTM)The multi-tag trade classification method and device of model
CN107169036A (en) * 2017-04-19 2017-09-15 畅捷通信息技术股份有限公司 Determine the method and system of the affiliated category of employment of enterprise

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020091557A1 (en) * 2001-01-08 2002-07-11 Srinivas Akkaraju Method for facilitating transactions of life sciences opportunities
CN103336796A (en) * 2013-06-09 2013-10-02 北京百度网讯科技有限公司 Method and system for displaying door buster directly
CN106779467A (en) * 2016-12-31 2017-05-31 成都数联铭品科技有限公司 Enterprises ' industry categorizing system based on automatic information screening
CN106777335A (en) * 2017-01-13 2017-05-31 深圳爱拼信息科技有限公司 It is a kind of to be remembered based on shot and long term(LSTM)The multi-tag trade classification method and device of model
CN107169036A (en) * 2017-04-19 2017-09-15 畅捷通信息技术股份有限公司 Determine the method and system of the affiliated category of employment of enterprise

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110929124A (en) * 2019-11-07 2020-03-27 上海融贷通金融信息服务有限公司 Enterprise information recommendation method and system based on natural language
CN110827092A (en) * 2019-11-13 2020-02-21 广州点动信息科技股份有限公司 Business information analysis and statistics method and system based on cloud platform
CN111209397A (en) * 2019-12-30 2020-05-29 中伯伦(北京)信息技术有限公司 Method for determining enterprise industry category
CN111209397B (en) * 2019-12-30 2020-09-08 中伯伦(北京)信息技术有限公司 Method for determining enterprise industry category
CN113591979A (en) * 2021-07-30 2021-11-02 深圳前海微众银行股份有限公司 Industry category identification method, equipment, medium and computer program product
CN113591979B (en) * 2021-07-30 2024-11-08 深圳前海微众银行股份有限公司 Industry category identification method, equipment, medium and computer program product

Similar Documents

Publication Publication Date Title
WO2020228376A1 (en) Text processing method and model training method and apparatus
US12039447B2 (en) Information processing method and terminal, and computer storage medium
CN111368042A (en) Intelligent question and answer method and device, computer equipment and computer storage medium
CN111523324B (en) Named entity recognition model training method and device
CN108874921A (en) Extract method, apparatus, terminal device and the storage medium of text feature word
CN110188195A (en) A kind of text intension recognizing method, device and equipment based on deep learning
CN110598869B (en) Classification method and device based on sequence model and electronic equipment
CN112580328A (en) Event information extraction method and device, storage medium and electronic equipment
CN110245226A (en) Enterprises ' industry classification method and its device
CN113010678B (en) Training method of classification model, text classification method and device
CN110245228A (en) The method and apparatus for determining text categories
CN110717009A (en) Method and equipment for generating legal consultation report
CN116010581A (en) Knowledge graph question-answering method and system based on power grid hidden trouble shooting scene
Aziguli et al. A robust text classifier based on denoising deep neural network in the analysis of big data
Nguyen et al. An ensemble of shallow and deep learning algorithms for Vietnamese sentiment analysis
CN111061876B (en) Event public opinion data analysis method and device
CN115168537A (en) Training method and device of semantic retrieval model, electronic equipment and storage medium
CN104699819A (en) Sememe classification method and device
CN110232328A (en) A kind of reference report analytic method, device and computer readable storage medium
CN116881462A (en) Text data processing, text representation and text clustering method and equipment
CN111625858A (en) Intelligent multi-mode data desensitization method and device in vertical field
CN116450827A (en) Event template induction method and system based on large-scale language model
CN111274382A (en) Text classification method, device, equipment and storage medium
US20200110996A1 (en) Machine learning of keywords
CN114818644B (en) Text template generation method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190917