CN108171276A - For generating the method and apparatus of information - Google Patents

For generating the method and apparatus of information Download PDF

Info

Publication number
CN108171276A
CN108171276A CN201810045681.1A CN201810045681A CN108171276A CN 108171276 A CN108171276 A CN 108171276A CN 201810045681 A CN201810045681 A CN 201810045681A CN 108171276 A CN108171276 A CN 108171276A
Authority
CN
China
Prior art keywords
information
word
term vector
enterprise name
enterprise
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810045681.1A
Other languages
Chinese (zh)
Other versions
CN108171276B (en
Inventor
骆金昌
方军
尹存祥
郑志彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu Online Network Technology Beijing Co Ltd
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201810045681.1A priority Critical patent/CN108171276B/en
Publication of CN108171276A publication Critical patent/CN108171276A/en
Application granted granted Critical
Publication of CN108171276B publication Critical patent/CN108171276B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0637Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Educational Administration (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Game Theory and Decision Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Health & Medical Sciences (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the present application discloses the method and apparatus for generating information.One specific embodiment of this method includes:The company information of Target Enterprise is extracted, wherein, which includes enterprise name and business scope information;Fisrt feature information is extracted from the enterprise name and the business scope information;Second feature information is extracted from remaining information;The fisrt feature information with second feature information is merged, the characteristic information after fusion is input to industry identification model trained in advance, obtains the category of employment of the Target Enterprise.This embodiment improves the flexibilities of information generation.

Description

For generating the method and apparatus of information
Technical field
The invention relates to field of computer technology, and in particular to Internet technical field, it is more particularly, to raw Into the method and apparatus of information.
Background technology
With the development of computer technology, in order to preferably carry out and enterprise relevant data analysis (such as business risk Analysis, enterprise's collection of illustrative plates construction etc.), it usually needs enterprise is referred to correct industry, and add industry label for enterprise.
Existing mode is typically by way of rule and dictionary pattern matching, identifies the keyword in company information, is led to It crosses and manually these keywords is mapped in preset trade classification, it usually needs consume larger manpower.
Invention content
The embodiment of the present application proposes the method and apparatus for generating information.
In a first aspect, the embodiment of the present application provides a kind of method for generating information, this method includes:Extract target The company information of enterprise, wherein, company information includes enterprise name and business scope information;Believe from enterprise name and business scope Fisrt feature information is extracted in breath;Second feature information is extracted from remaining information, wherein, remaining information is in company information , information in addition to enterprise name and business scope information;Fisrt feature information is merged with second feature information, it will Characteristic information after fusion is input to industry identification model trained in advance, obtains the category of employment of Target Enterprise, wherein, industry Identification model is used for the correspondence of characteristic feature information and category of employment.
In some embodiments, fisrt feature information is extracted from enterprise name and business scope information, including:It is right respectively Enterprise name and business scope information are segmented, and determine the term vector of each word after participle;It is extracted from enterprise name crucial Word;To in enterprise name the term vector of each word, the term vector of keyword, each word in business scope information term vector carry out Parsing generates fisrt feature information.
In some embodiments, in enterprise name the term vector of each word, the term vector of keyword, business scope information In the term vector of each word parsed, generate fisrt feature information, including:By the term vector of each word in enterprise name, close The term vector of keyword, each word in business scope information term vector be separately input into advance trained Feature Selection Model, obtain To feature vector corresponding with enterprise name, keyword, business scope information respectively, will respectively with enterprise name, keyword, warp The corresponding feature vector of battalion's range information is determined as fisrt feature information, wherein, Feature Selection Model is used to extract text feature.
In some embodiments, Feature Selection Model is by the convolutional layer of convolutional neural networks trained in advance and maximum pond Layer composition.
In some embodiments, industry identification model is the full articulamentum of convolutional neural networks.
In some embodiments, enterprise name and business scope information are segmented respectively, determines each word after participle Term vector, including:Enterprise name and business scope information are segmented respectively, by each word in enterprise name and manage model It encloses each word in information and is separately input into term vector model trained in advance, obtain the term vector and warp of each word in enterprise name The term vector of each word in range information is sought, wherein, term vector model is used to generate the term vector of word.
In some embodiments, remaining information includes at least one of following of Target Enterprise:Management position, registration type, Scale sets up time, place;And second feature information is extracted from remaining information, including:It determines each in remaining information The corresponding one-hot coding of item;The corresponding one-hot coding of items in remaining information is spliced, generates second feature information.
Second aspect, the embodiment of the present application provide a kind of device for being used to generate information, which includes:First extraction Unit is configured to the company information of extraction Target Enterprise, wherein, company information includes enterprise name and business scope information; Second extraction unit is configured to extract fisrt feature information from enterprise name and business scope information;Third extraction unit, Be configured to extract second feature information from remaining information, wherein, remaining information is in company information, except enterprise name and Information other than business scope information;Input unit is configured to merge fisrt feature information with second feature information, Characteristic information after fusion is input to industry identification model trained in advance, obtains the category of employment of Target Enterprise, wherein, row Industry identification model is used for the correspondence of characteristic feature information and category of employment.
In some embodiments, the second extraction unit includes:Word-dividing mode is configured to respectively to enterprise name and operation Range information is segmented, and determines the term vector of each word after participle;Extraction module is configured to extract from enterprise name and close Keyword;Generation module, be configured to in enterprise name the term vector of each word, the term vector of keyword, business scope information In the term vector of each word parsed, generate fisrt feature information.
In some embodiments, generation module is further configured to:By the term vector of each word in enterprise name, key The term vector of word, each word in business scope information term vector be separately input into advance trained Feature Selection Model, obtain Feature vector corresponding with enterprise name, keyword, business scope information respectively, will respectively with enterprise name, keyword, operation The corresponding feature vector of range information is determined as fisrt feature information, wherein, Feature Selection Model is used to extract text feature.
In some embodiments, Feature Selection Model is by the convolutional layer of convolutional neural networks trained in advance and maximum pond Layer composition.
In some embodiments, industry identification model is the full articulamentum of convolutional neural networks.
In some embodiments, word-dividing mode is further configured to:Respectively to enterprise name and business scope information into Each word in each word and business scope information in enterprise name, is separately input into term vector mould trained in advance by row participle Type obtains the term vector of each word in the term vector and business scope information of each word in enterprise name, wherein, term vector model For generating the term vector of word.
In some embodiments, remaining information includes at least one of following of Target Enterprise:Management position, registration type, Scale sets up time, place;And third extraction unit includes:Determining module is configured to determine each in remaining information The corresponding one-hot coding of item;Concatenation module is configured to splice the corresponding one-hot coding of items in remaining information, raw Into second feature information.
The third aspect, the embodiment of the present application provide a kind of server, including:One or more processors;Storage device, For storing one or more programs, when one or more programs are executed by one or more processors so that one or more The method that processor realizes any embodiment in the method for being such as used for generating information.
Fourth aspect, the embodiment of the present application provide a kind of computer readable storage medium, are stored thereon with computer journey Sequence, the method that any embodiment in the method for being such as used for generating information is realized when which is executed by processor.
Method and apparatus provided by the embodiments of the present application for generating information are believed by the enterprise for extracting Target Enterprise Breath, to extract fisrt feature information from enterprise name and business scope information and to extract second feature from remaining information The fisrt feature information then with second feature information is merged, the characteristic information after fusion is input in advance by information Trained industry identification model, obtains the category of employment of the Target Enterprise, so as to fully extract the feature in company information Information, and determine based on the characteristic information extracted the category of employment of enterprise, it does not need to manually carry out Keywords matching, improve The flexibility of information generation.
Description of the drawings
By reading the detailed description made to non-limiting example made with reference to the following drawings, the application's is other Feature, objects and advantages will become more apparent upon:
Fig. 1 is that this application can be applied to exemplary system architecture figures therein;
Fig. 2 is the flow chart for being used to generate one embodiment of the method for information according to the application;
Fig. 3 is the schematic diagram for being used to generate an application scenarios of the method for information according to the application;
Fig. 4 is the flow chart for being used to generate another embodiment of the method for information according to the application;
Fig. 5 is the structure diagram for being used to generate one embodiment of the device of information according to the application;
Fig. 6 is adapted for the structure diagram of the computer system of the server for realizing the embodiment of the present application.
Specific embodiment
The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention rather than the restriction to the invention.It also should be noted that in order to Convenient for description, illustrated only in attached drawing and invent relevant part with related.
It should be noted that in the absence of conflict, the feature in embodiment and embodiment in the application can phase Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
Fig. 1 shows the method for being used to generate information that can apply the application or the example for generating the device of information Sexual system framework 100.
As shown in Figure 1, system architecture 100 can include terminal device 101,102,103, network 104 and server 105. Network 104 between terminal device 101,102,103 and server 105 provide communication link medium.Network 104 can be with Including various connection types, such as wired, wireless communication link or fiber optic cables etc..
User can be interacted with using terminal equipment 101,102,103 by network 104 with server 105, to receive or send out Send message etc..Various telecommunication customer end applications can be installed, such as web browser should on terminal device 101,102,103 With, searching class application, instant messaging tools, mailbox client, social platform software etc..
Terminal device 101,102,103 can be the various electronic equipments with display screen and supported web page browsing, wrap It includes but is not limited to smart mobile phone, tablet computer, pocket computer on knee and desktop computer etc..
Server 105 can be to provide the server of various services, such as company information is stored and is managed deposit Store up server.The company information that storage server can upload terminal device 101,102,103 is stored, is managed, is analyzed Deng processing, and generate handling result (such as enterprise sort).
It should be noted that generally being held for the method that generates information by server 105 of being provided of the embodiment of the present application Row, correspondingly, the device for generating information is generally positioned in server 105.
It should be understood that the number of the terminal device, network and server in Fig. 1 is only schematical.According to realization need Will, can have any number of terminal device, network and server.
With continued reference to Fig. 2, the flow for being used to generate one embodiment of the method for information according to the application is shown 200.The described method for generating information includes the following steps:
Step 201, the company information of Target Enterprise is extracted.
In the present embodiment, for generating electronic equipment (such as the service shown in FIG. 1 of the method for information operation thereon Device 105) in can be previously stored with the company information of a large amount of enterprise, above-mentioned electronic equipment can therefrom extract Target Enterprise Company information.Wherein, above-mentioned Target Enterprise can be that not yet the enterprise of label category of employment or technical staff are preassigned The enterprise of category of employment to be determined.The company information of above-mentioned Target Enterprise can be comprising relevant various with above-mentioned Target Enterprise The text of information, for example, can include enterprise name with the relevant information of above-mentioned Target Enterprise, (such as " certain International Technology is (deep Ditch between fields) Co., Ltd "), business scope information (such as " electric type product "), enterprise personnel form, enterprises service crowd etc..
It should be noted that in application scenes, the company information of above-mentioned Target Enterprise can also be terminal device (such as terminal device shown in FIG. 1 101,102,103) is sent to above-mentioned electronics by wired connection or radio connection In equipment.It should be pointed out that above-mentioned radio connection can include but is not limited to 3G/4G connections, WiFi connections, bluetooth Connection, WiMAX connections, Zigbee connections, UWB (ultra wideband) connections and other currently known or exploitations in the future Radio connection.
Step 202, fisrt feature information is extracted from enterprise name and business scope information.
In the present embodiment, above-mentioned electronic equipment various text features can be utilized from above-mentioned enterprise name and Fisrt feature information is extracted in above-mentioned business scope information.Wherein, above-mentioned fisrt feature information can be for characterizing above-mentioned enterprise The information of industry title and the text feature in above-mentioned business scope information, such as characterized in vector form.Above-mentioned text Feature can be the various information characterized for the fundamental (such as semanteme, keyword, Feature Words etc.) to text.
In some optional realization methods of the present embodiment, the analysis mode to the content of above-mentioned Webpage can be Statistical analysis mode.For example, the frequency of occurrences of word each present in the above can be counted and be sorted;It Afterwards, then the forward one or more words of frequency of occurrences sequence are chosen as keyword to be extracted;It can finally utilize various Term vector generation method (such as using the term vector calculating instrument word2vec to increase income) generates the term vector of keyword, will give birth to Into term vector be determined as fisrt feature information.
In some optional realization methods of the present embodiment, above-mentioned electronic equipment can utilize the text based on statistics special Levy extracting method extraction fisrt feature information.It as an example, can be first to above-mentioned enterprise name and above-mentioned business scope information The processing such as full cutting method are carried out, above-mentioned enterprise name and above-mentioned business scope information are divided into word.It then, can be to gained The word arrived carries out importance calculating (for example, by using word frequency-reverse document-frequency method (Term Frequency-Inverse Document Frequency, TF-IDF)), keyword is obtained based on the result of importance calculating.Word frequency-reverse file frequency The main thought of rate method is, if the frequency (Term Frequency, TF) that some word or phrase occur in an article Height, and seldom occur in other articles, then it is assumed that this word or phrase have good class discrimination ability, are adapted to Classification.And reverse document-frequency (Inverse Document Frequency, IDF) is primarily referred to as, if comprising some word or The document of phrase is fewer, then IDF is bigger, then illustrates that the word or phrase have good class discrimination ability.Word is used as a result, Frequently-reverse document-frequency method, can calculate the importance of some word or phrase inside certain article.Finally, it can utilize The term vector of various term vector generation method (such as using the term vector calculating instrument word2vec to increase income) generation keywords, will The term vector generated is determined as fisrt feature information.It should be noted that above-mentioned full cutting method, word frequency-reverse file frequency Rate method is the known technology studied and applied extensively at present, and details are not described herein.
In some optional realization methods of the present embodiment, above-mentioned electronic equipment can utilize semantic-based text special Levy extracting method extraction fisrt feature information.It as an example, can be first respectively to above-mentioned enterprise name and above-mentioned business scope Information is segmented, and the term vector of each word after participle is determined using various term vector generation methods;It then, can be to being extracted Term vector parsed, generate fisrt feature information.Herein, Feature Selection Model trained in advance can be utilized to being extracted Term vector parsed, to extract fisrt feature information.Feature Selection Model can utilize machine learning method and training Sample, to the various existing model (such as Recognition with Recurrent Neural Network (Recurrent that can realize Text character extraction function Neural Network, RNN), shot and long term memory network (Long Short-Term Memory, LSTM) etc.) carried out supervision Training obtains.In practice, the characteristic information of features described above extraction model output can be indicated in vector form.
In some optional realization methods of the present embodiment, also extraction first is special as follows for above-mentioned electronic equipment Reference ceases:
The first step can utilize various participle modes (such as Forward Maximum Method segmenting method, reverse maximum matching participle Method etc.) above-mentioned enterprise name and above-mentioned business scope information are segmented respectively, and utilize various term vector generating modes Determine the term vector (such as using the term vector calculating instrument word2vec to increase income) of each word after participle.
Second step can extract keyword from above-mentioned enterprise name.Herein, above-mentioned keyword can be for enterprise The word that category of employment plays an important roll.For example, enterprise name is " certain International Technology (Shenzhen) Co., Ltd ", then keyword can To be " science and technology ".In practice, the mode that string matching is carried out with the preset keyword in preset keyword set may be used Extract the keyword in above-mentioned enterprise name.
Third walks, can be to the term vector of each word in above-mentioned enterprise name, the term vector of above-mentioned keyword, above-mentioned operation The term vector of each word in range information is parsed, and generates fisrt feature information.Herein, various preset modes can be utilized Term vector is parsed.As an example, can the term vector generated be combined as matrix first, it then, can be to the square Battle array carries out the processing such as convolution, down-sampled, and the processing such as above-mentioned convolution, down-sampled can perform repeatedly, the vector that will be finally obtained As first eigenvector.
In some optional realization methods of the present embodiment, above-mentioned third step, to each word in above-mentioned enterprise name Term vector, the term vector of above-mentioned keyword, each word in above-mentioned business scope information term vector parsed, generation first is special Reference ceases, and can carry out in the following manner:Above-mentioned electronic equipment can by the term vector of each word in above-mentioned enterprise name, on State the term vector of keyword, the term vector of each word in above-mentioned business scope information is separately input into feature extraction trained in advance Model, obtains feature vector corresponding with above-mentioned enterprise name, above-mentioned keyword, above-mentioned business scope information respectively, and will point Feature vector not corresponding with above-mentioned enterprise name, above-mentioned keyword, above-mentioned business scope information is determined as fisrt feature information. Wherein, features described above extraction model can be used for extracting text feature.Herein, Feature Selection Model can utilize machine learning Method and training sample, to various existing model (such as the Recognition with Recurrent Neural Network that can realize Text character extraction function (Recurrent neural Network, RNN), shot and long term memory network (Long Short-Term Memory, LSTM), by Limit Boltzmann machine (Restricted Boltzmann Machine, RBM) etc.) it carries out Training and obtains.
Step 203, second feature information is extracted from remaining information.
In the present embodiment, above-mentioned electronic equipment can by it is in above-mentioned company information, except enterprise name and business scope Information other than information is known as remaining information, and second feature information is extracted from remaining above-mentioned information.Herein, above-mentioned electronic equipment Sharp second feature information can be extracted in various manners.As an example, the text feature based on statistics can be utilized Extract second feature information.The processing such as full cutting method are carried out to remaining information first, remaining information is divided into word;Then adopt Importance calculating is carried out to obtained word with word frequency-reverse document-frequency method, is obtained based on the result of importance calculating Keyword;Finally, the term vector of keyword is generated, the term vector generated is determined as second feature information.Need what is illustrated It is that above-mentioned second feature information can be indicated in vector form.
In some optional realization methods of the present embodiment, remaining above-mentioned information can include above-mentioned Target Enterprise with It is at least one of lower:Management position, scale, sets up time, place (such as province, city etc.) at registration type.Above-mentioned electronic equipment Second feature information can be extracted in accordance with the following steps:The first step, it may be determined that each single item in remaining above-mentioned information is corresponding Solely heat (One-Hot) coding.In practice, one-hot coding is also known as an efficient coding, and method is to use N (N is positive integer) position Status register encodes N number of state, register-bit that each state has it independent, and when arbitrary, In only one effectively.For example, six states are encoded:Natural order code is 000,001,010,011,100,101, then One-hot coding can be 000001,000010,000100,001000,010000,100000.In general, one-hot coding can be used for The discrete features of text are handled, also play the effect of augmented features to a certain extent, one-hot coding can be with the shape of vector Formula represents.It should be noted that above-mentioned one-hot coding method is the known technology studied and applied extensively at present, herein no longer It repeats.Second step can splice the corresponding one-hot coding of items in remaining above-mentioned information, generation second feature letter Breath, that is, using spliced coding as second feature information (vector).For example, province one shares 23, one 23 can be constructed The vector of dimension, per one province of one-dimensional representation, the element in the corresponding vector in province where above-mentioned Target Enterprise is 1, in vector Other elements be 0.
Step 204, fisrt feature information with second feature information is merged, the characteristic information after fusion is input to Trained industry identification model in advance, obtains the category of employment of Target Enterprise.
In the present embodiment, above-mentioned electronic equipment can first melt fisrt feature information and second feature information It closes.Herein, since above-mentioned fisrt feature information and above-mentioned second feature information can be indicated in vector form, because This, can merge above-mentioned fisrt feature information and above-mentioned second feature information using the mode of vector splicing.Then, on The characteristic information after fusion can be input to industry identification model trained in advance by stating electronic equipment, obtain the row of Target Enterprise Industry classification.Wherein, above-mentioned industry identification model can be used for the correspondence of characteristic feature information and category of employment.As showing Example, above-mentioned industry identification model, which can be technical staff, to be counted based on mass data and is pre-established, each characteristic information and row The mapping table of industry classification.
In some optional realization methods of the present embodiment, above-mentioned industry identification model can train as follows It obtains:It is possible, firstly, to preset training sample is extracted, wherein, above-mentioned training sample can include the company information of multiple enterprises Sample can also include the corresponding enterprise sort mark of each company information sample.It then, can be from each company information Fisrt feature information, second feature information are extracted in sample, herein, the mode of fisrt feature information and second feature information can be with Mode used in step 202 and step 203 is respectively adopted, details are not described herein again.It later, can will be from each company information The fisrt feature information and second feature information extracted in sample are merged, using machine learning method, by the spy after fusion Reference breath is as input, using the corresponding enterprise sort mark of the company information sample as output, to existing achievable classification Model (such as model-naive Bayesian (Naive Bayesian Model, NBM), the support vector machines (Support of function Vector Machine, SVM) or classification function (such as softmax functions etc.) carry out Training, training after model or Classification function is determined as industry identification model.
It, can after the category of employment of above-mentioned Target Enterprise is obtained in some optional realization methods of the present embodiment The trade information of the above-mentioned Target Enterprise stored is added profession identity, the sector mark can serve to indicate that above-mentioned target The category of employment of enterprise.In practice, addition profession identity can be in order to subsequently carrying out business risk analysis, enterprise collection of illustrative plates construction Deng.
With continued reference to Fig. 3, Fig. 3 is to be illustrated according to the present embodiment for generating one of the application scenarios of the method for information Figure.In the application scenarios of Fig. 3, the storage server for being stored and being managed to company information is stored first from local Company information list in be extracted some enterprise company information 301 (comprising enterprise name 302,303 and of business scope information Remaining information 304 in addition to this two), then the storage server is carried from enterprise name 302 and business scope information 303 Fisrt feature information 305 is taken, second feature information 306 is extracted from remaining information 304 later, finally by the fisrt feature information It is merged with second feature information, the characteristic information 307 after fusion is input to industry identification model trained in advance, is obtained The category of employment 308 of the Target Enterprise.
The method that above-described embodiment of the application provides, by extracting the company information of Target Enterprise, so as to from enterprise's name Claim and business scope information in extraction fisrt feature information and second feature information is extracted from remaining information, then by this One characteristic information is merged with second feature information, and the characteristic information after fusion is input to industry trained in advance identifies mould Type obtains the category of employment of the Target Enterprise, so as to fully extract the characteristic information in company information, and is based on being extracted Characteristic information determine the category of employment of enterprise, do not need to manually carry out Keywords matching, improve the flexibility of information generation.
With further reference to Fig. 4, it illustrates for generating the flow 400 of another embodiment of the method for information.The use In the flow 400 of the method for generation information, include the following steps:
Step 401, the company information of Target Enterprise is extracted.
In the present embodiment, for generating electronic equipment (such as the service shown in FIG. 1 of the method for information operation thereon Device 105) in can be previously stored with the company information of a large amount of enterprise, above-mentioned electronic equipment can therefrom extract Target Enterprise Company information.Wherein, the company information of above-mentioned Target Enterprise can be included and the relevant various information of above-mentioned Target Enterprise Text, for example, above-mentioned company information can include enterprise name and business scope information.
Step 402, enterprise name and business scope information are segmented respectively, by each word in enterprise name and operation Each word in range information is separately input into advance trained term vector model, obtain each word in enterprise name term vector and The term vector of each word in business scope information.
In the present embodiment, above-mentioned electronic equipment can utilize various participle modes (such as Forward Maximum Method participle side Method, reverse maximum match segmentation etc.) above-mentioned enterprise name and above-mentioned business scope information are segmented respectively, it will be above-mentioned Each word in each word and above-mentioned business scope information in enterprise name is separately input into term vector model trained in advance, obtains The term vector of the term vector of each word in above-mentioned enterprise name and each word in above-mentioned business scope information.Wherein, upper predicate to Amount model can be used for generating the term vector of word.Above-mentioned term vector model can be using machine learning method, based on by a large amount of Enterprise name and the training sample that forms of business scope information (such as increase income to the existing model that can be used for generation term vector Term vector calculating instrument word2vec used in model) carry out Training and obtain.It is trained using the training sample Term vector model afterwards, due to carrying out model training using targetedly training sample, thus, than random initialization vector or adopt It is more preferable with the effect for not limiting the term vector model after field (such as text unrelated with enterprise) is trained.
Step 403, keyword is extracted from enterprise name.
In the present embodiment, above-mentioned electronic equipment can extract keyword from above-mentioned enterprise name.Herein, above-mentioned key Word can be the word played an important roll for the category of employment of enterprise.For example, enterprise name is " certain International Technology (Shenzhen) has Limit company ", then keyword can be " science and technology ".In practice, may be used with the preset keyword in preset keyword set into The mode of line character String matching extracts the keyword in above-mentioned enterprise name.
Step 404, by the term vector of each word in above-mentioned enterprise name, above-mentioned keyword term vector, above-mentioned manage model The term vector for enclosing each word in information is separately input into advance trained Feature Selection Model, obtain respectively with above-mentioned enterprise name Title, above-mentioned keyword, the corresponding feature vector of above-mentioned business scope information, will respectively with above-mentioned enterprise name, above-mentioned keyword, The corresponding feature vector of above-mentioned business scope information is determined as fisrt feature information.
In the present embodiment, above-mentioned electronic equipment can be by the term vector of each word in above-mentioned enterprise name, above-mentioned key The term vector of word, each word in above-mentioned business scope information term vector be separately input into Feature Selection Model trained in advance, Obtain feature vector corresponding with above-mentioned enterprise name, above-mentioned keyword, above-mentioned business scope information respectively, and will respectively with it is upper It states enterprise name, above-mentioned keyword, the corresponding feature vector of above-mentioned business scope information and is determined as fisrt feature information.Wherein, Features described above extraction model can by train in advance convolutional neural networks (Convolutional Neural Network, CNN convolutional layer and maximum pond layer composition).Herein, above-mentioned convolutional neural networks can include one or more convolutional layers, most Great Chiization layer, vector splicing layer and full articulamentum (fully connected layers, FC).Convolutional layer can be used for input Matrix to the convolutional layer carries out convolutional calculation, can also realize and feature extraction and down-sampled is carried out to the matrix of input (downsample);Maximum pond layer can be used for carrying out down-sampled and output vector to the matrix of input;Vector splicing layer can It for each vector for exporting maximum pond layer and individually enters to the other feature vector of this layer and splices, will spell Vector after connecing is input to full articulamentum;Full articulamentum can realize the differentiation of category of employment.In practice, full articulamentum is entire Play the role of in convolutional neural networks " grader ".Full articulamentum can be by the Feature Mapping learnt to sample labeling sky Between.Herein, matrix, the above-mentioned keyword that above-mentioned electronic equipment can form the term vector of each word in above-mentioned enterprise name The matrix of term vector composition, the matrix that the term vector of each word in above-mentioned business scope information forms are separately input into above-mentioned convolution Neural network (a layer convolutional layer for being input to convolutional neural networks), by the maximum pond layer institute of above-mentioned convolutional neural networks The vector exported respectively respectively as feature corresponding with above-mentioned enterprise name, above-mentioned keyword, above-mentioned business scope information to Amount, and feature vector corresponding with above-mentioned enterprise name, above-mentioned keyword, above-mentioned business scope information respectively is determined as first Characteristic information.
In practice, above-mentioned convolutional neural networks can be trained and be obtained as follows:It is possible, firstly, to extract it is preset, For the training sample of training convolutional neural networks, wherein, above-mentioned training sample can include the company information sample of multiple enterprises This, can also include the corresponding enterprise sort mark of each company information sample.Then, for each company information sample This, can respectively segment the enterprise name in the company information sample and business scope information, will be in enterprise name Each word in each word and business scope information is separately input into term vector model trained in advance, obtains each word in enterprise name Term vector and business scope information in each word term vector.It later, can be from the enterprise in each company information sample Title extracts keyword.Then, it for each company information sample, can be carried from remaining information of the company information sample Second feature information is taken (mode that every one-hot coding in remaining information splices to be obtained for example, may be used Two characteristic informations, and second feature information is indicated in the form of vectors).It finally, can will be in each company information sample Enterprise name in the term vector of each word, the term vector of keyword, each word in business scope information term vector as pre- The input (a layer convolutional layer for being input to convolutional neural networks) for the convolutional neural networks first established, by the company information sample This corresponding second feature information is input to the vector splicing layer of the convolutional neural networks pre-established, by the company information sample Corresponding enterprise sort marks the output as the convolutional neural networks pre-established, this is built in advance using machine learning method Vertical convolutional neural networks carry out Training, the convolutional neural networks after being trained.Herein, by extracting enterprise name In keyword, then the term vector of a keyword part as input is input to the mode that model is trained, can given The clearer guiding of model is given, improves the accuracy of Model checking;Also, convolutional neural networks have training and forecasting efficiency The advantages of high, has good nonlinear fitting ability, it is ensured that the precision of trade classification.
Step 405, the corresponding one-hot coding of each single item in remaining information is determined.
In the present embodiment, above-mentioned electronic equipment can determine the corresponding one-hot coding of each single item in remaining information, In, remaining above-mentioned information can be in the company information of above-mentioned Target Enterprise, except above-mentioned enterprise name and above-mentioned business scope Information other than information.Remaining above-mentioned information can include at least one of following of above-mentioned Target Enterprise:Management position, registration class Type, sets up time, place (such as province, city etc.) at scale.Above-mentioned electronic equipment can determine every in remaining above-mentioned information One corresponding only hot (One-Hot) coding.In practice, one-hot coding can be indicated in vector form.
Step 406, the corresponding one-hot coding of items in remaining information is spliced, generates second feature information.
In the present embodiment, above-mentioned electronic equipment can carry out the corresponding one-hot coding of items in remaining above-mentioned information Splicing generates second feature information, that is, encodes spliced and (can be indicated by the use of the form of vector) as second feature Information.
Step 407, fisrt feature information with second feature information is merged, the characteristic information after fusion is input to Trained industry identification model in advance, obtains the category of employment of Target Enterprise.
In the present embodiment, above-mentioned electronic equipment can first melt fisrt feature information and second feature information It closes.Herein, since above-mentioned fisrt feature information and above-mentioned second feature information can be indicated in vector form, because This, can merge above-mentioned fisrt feature information and above-mentioned second feature information using the mode of vector splicing.Then, on The characteristic information after fusion can be input to industry identification model trained in advance by stating electronic equipment, obtain the row of Target Enterprise Industry classification.Wherein, above-mentioned industry identification model can be used for the correspondence of characteristic feature information and category of employment.As showing Example, above-mentioned industry identification model, which can be technical staff, to be counted based on mass data and is pre-established, each characteristic information and row The mapping table of industry classification.It should be noted that above-mentioned, by fisrt feature information and second feature information merge can be with It is performed by the vector splicing layer of above-mentioned convolutional neural networks trained in advance, above-mentioned industry identification model can be above-mentioned advance instruction The full articulamentum of experienced convolutional neural networks, full articulamentum can utilize classification function (such as softmax functions) to industry class Do not judged.In practice, full articulamentum can export the probability of Target Enterprise input industry-by-industry classification, at this point it is possible to will The corresponding category of employment of maximum probability value is determined as the category of employment of the Target Enterprise;In addition, full articulamentum can also be directly defeated Go out the corresponding category of employment of maximum probability value.
It, can after the category of employment of above-mentioned Target Enterprise is obtained in some optional realization methods of the present embodiment The trade information of the above-mentioned Target Enterprise stored is added profession identity, the sector mark can serve to indicate that above-mentioned target The category of employment of enterprise.In practice, addition profession identity can be in order to subsequently carrying out business risk analysis, enterprise collection of illustrative plates construction Deng.
Figure 4, it is seen that compared with the corresponding embodiments of Fig. 2, in the present embodiment for the method that generates information Flow 400 highlight carry out feature extraction using convolutional neural networks trained in advance, the step of category of employment differentiates, due to Convolutional neural networks have the advantages that training and forecasting efficiency are high, have good nonlinear fitting ability, thus improve row The precision of industry classification.In addition, flow 400 is also highlighted by extracting the keyword in enterprise name, then by the word of keyword to It measures and is trained the step of differentiating with category of employment as a part for the input of model, thus can utilize keyword that can give The clearer guiding of model is given, improves the accuracy of Model checking.In addition, flow 400 is also highlighted using based on by a large amount of Enterprise name and the vector model trained of training sample that forms of business scope information the step of carrying out term vector generation, by In carrying out model training using targetedly training sample, thus, than random initialization vector or using do not limit field (such as The text unrelated with enterprise) training after term vector model effect it is more preferable.The scheme of the present embodiment description can improve as a result, The accuracy of category of employment generation.
With further reference to Fig. 5, as the realization to method shown in above-mentioned each figure, this application provides one kind for generating letter One embodiment of the device of breath, the device embodiment is corresponding with embodiment of the method shown in Fig. 2, which can specifically answer For in various electronic equipments.
As shown in figure 5, include described in the present embodiment for generating the device 500 of information:First extraction unit 501, matches The company information for extracting Target Enterprise is put, wherein, above-mentioned company information includes enterprise name and business scope information;Second Extraction unit 502 is configured to extract fisrt feature information from above-mentioned enterprise name and above-mentioned business scope information;Third carries Unit 503 is taken, is configured to extract second feature information from remaining information, wherein, remaining above-mentioned information is in company information , information in addition to above-mentioned enterprise name and above-mentioned business scope information;Input unit 504 is configured to above-mentioned first Characteristic information is merged with second feature information, and the characteristic information after fusion is input to industry trained in advance identifies mould Type obtains the category of employment of above-mentioned Target Enterprise, wherein, above-mentioned industry identification model is used for characteristic feature information and category of employment Correspondence.
In some optional realization methods of the present embodiment, above-mentioned second extraction unit 502 can include word-dividing mode, Extraction module and generation module (not shown).Wherein, above-mentioned word-dividing mode may be configured to respectively to above-mentioned enterprise name Claim and above-mentioned business scope information is segmented, determine the term vector of each word after participle.Use can be configured in said extracted module In extracting keyword from above-mentioned enterprise name.Above-mentioned generation module may be configured to each word in above-mentioned enterprise name Term vector, the term vector of above-mentioned keyword, each word in above-mentioned business scope information term vector parsed, generation first is special Reference ceases.
In some optional realization methods of the present embodiment, above-mentioned generation module can be further configured to:It will be upper State the term vector of each word in enterprise name, the term vector of above-mentioned keyword, each word in above-mentioned business scope information word to Amount is separately input into advance trained Feature Selection Model, obtain respectively with above-mentioned enterprise name, above-mentioned keyword, above-mentioned operation The corresponding feature vector of range information, will be corresponding with above-mentioned enterprise name, above-mentioned keyword, above-mentioned business scope information respectively Feature vector is determined as fisrt feature information, wherein, features described above extraction model is used to extract text feature.
In some optional realization methods of the present embodiment, features described above extraction model can be by the convolution trained in advance The convolutional layer of neural network and maximum pond layer composition.
In some optional realization methods of the present embodiment, above-mentioned industry identification model can be above-mentioned convolutional Neural net The full articulamentum of network.
In some embodiments, above-mentioned word-dividing mode can be further configured to respectively to above-mentioned enterprise name and above-mentioned Business scope information is segmented, and each word in each word in above-mentioned enterprise name and above-mentioned business scope information is inputted respectively To term vector model trained in advance, obtain in the term vector and above-mentioned business scope information of each word in above-mentioned enterprise name The term vector of each word, wherein, above-mentioned term vector model is used to generate the term vector of word.
In some optional realization methods of the present embodiment, remaining above-mentioned information can include above-mentioned Target Enterprise with It is at least one of lower:Management position, scale, sets up time, place at registration type.Above-mentioned third extraction unit 503 can include true Cover half block and concatenation module (not shown).Wherein, above-mentioned determining module may be configured to determine in remaining above-mentioned information The corresponding one-hot coding of each single item.Above-mentioned concatenation module may be configured to the items in remaining above-mentioned information are corresponding solely Heat coding is spliced, and generates second feature information.
The device that above-described embodiment of the application provides, the enterprise that Target Enterprise is extracted by the first extraction unit 501 believe Breath, so that the second extraction unit 502 extracts fisrt feature information and third extraction from enterprise name and business scope information Unit 503 extracts second feature information from remaining information, and then input unit 504 is by the fisrt feature information and second feature Information is merged, and the characteristic information after fusion is input to industry identification model trained in advance, obtains the Target Enterprise Category of employment so as to fully extract the characteristic information in company information, and determines enterprise based on the characteristic information extracted Category of employment, do not need to manually carry out Keywords matching, improve information generation flexibility.
Below with reference to Fig. 6, it illustrates suitable for being used for realizing the computer system 600 of the server of the embodiment of the present application Structure diagram.Server shown in Fig. 6 is only an example, should not be to the function of the embodiment of the present application and use scope band Carry out any restrictions.
As shown in fig. 6, computer system 600 includes central processing unit (CPU) 601, it can be read-only according to being stored in Program in memory (ROM) 602 or be loaded into program in random access storage device (RAM) 603 from storage section 608 and Perform various appropriate actions and processing.In RAM 603, also it is stored with system 600 and operates required various programs and data. CPU 601, ROM 602 and RAM 603 are connected with each other by bus 604.Input/output (I/O) interface 605 is also connected to always Line 604.
I/O interfaces 605 are connected to lower component:Importation 606 including keyboard, mouse etc.;It is penetrated including such as cathode The output par, c 607 of spool (CRT), liquid crystal display (LCD) etc. and loud speaker etc.;Storage section 608 including hard disk etc.; And the communications portion 609 of the network interface card including LAN card, modem etc..Communications portion 609 via such as because The network of spy's net performs communication process.Driver 610 is also according to needing to be connected to I/O interfaces 605.Detachable media 611, such as Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on driver 610, as needed in order to be read from thereon Computer program be mounted into storage section 608 as needed.
Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product, including being carried on computer-readable medium On computer program, which includes for the program code of the method shown in execution flow chart.In such reality It applies in example, which can be downloaded and installed from network by communications portion 609 and/or from detachable media 611 are mounted.When the computer program is performed by central processing unit (CPU) 601, perform what is limited in the present processes Above-mentioned function.It should be noted that computer-readable medium described herein can be computer-readable signal media or Computer readable storage medium either the two arbitrarily combines.Computer readable storage medium for example can be --- but It is not limited to --- electricity, magnetic, optical, electromagnetic, system, device or the device of infrared ray or semiconductor or arbitrary above combination. The more specific example of computer readable storage medium can include but is not limited to:Electrical connection with one or more conducting wires, Portable computer diskette, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type may be programmed read-only deposit Reservoir (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory Part or above-mentioned any appropriate combination.In this application, computer readable storage medium can any be included or store The tangible medium of program, the program can be commanded the either device use or in connection of execution system, device.And In the application, computer-readable signal media can include the data letter propagated in a base band or as a carrier wave part Number, wherein carrying computer-readable program code.Diversified forms may be used in the data-signal of this propagation, including but not It is limited to electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be computer Any computer-readable medium other than readable storage medium storing program for executing, the computer-readable medium can send, propagate or transmit use In by instruction execution system, device either device use or program in connection.It is included on computer-readable medium Program code any appropriate medium can be used to transmit, including but not limited to:Wirelessly, electric wire, optical cable, RF etc., Huo Zheshang Any appropriate combination stated.
Flow chart and block diagram in attached drawing, it is illustrated that according to the system of the various embodiments of the application, method and computer journey Architectural framework in the cards, function and the operation of sequence product.In this regard, each box in flow chart or block diagram can generation The part of one module of table, program segment or code, the part of the module, program segment or code include one or more use In the executable instruction of logic function as defined in realization.It should also be noted that it in some implementations as replacements, is marked in box The function of note can also be occurred with being different from the sequence marked in attached drawing.For example, two boxes succeedingly represented are actually It can perform substantially in parallel, they can also be performed in the opposite order sometimes, this is depended on the functions involved.Also it to note Meaning, the combination of each box in block diagram and/or flow chart and the box in block diagram and/or flow chart can be with holding The dedicated hardware based system of functions or operations as defined in row is realized or can use specialized hardware and computer instruction Combination realize.
Being described in unit involved in the embodiment of the present application can be realized by way of software, can also be by hard The mode of part is realized.Described unit can also be set in the processor, for example, can be described as:A kind of processor packet Include the first extraction unit, the first extraction unit, the first extraction unit and input unit.Wherein, the title of these units is at certain In the case of do not form restriction to the unit in itself, for example, the first extraction unit is also described as " extraction Target Enterprise Company information unit ".
As on the other hand, present invention also provides a kind of computer-readable medium, which can be Included in device described in above-described embodiment;Can also be individualism, and without be incorporated the device in.Above-mentioned calculating Machine readable medium carries one or more program, when said one or multiple programs are performed by the device so that should Device:The company information of Target Enterprise is extracted, wherein, which includes enterprise name and business scope information;From the enterprise Fisrt feature information is extracted in industry title and the business scope information;Second feature information is extracted from remaining information;By this One characteristic information is merged with second feature information, and the characteristic information after fusion is input to industry trained in advance identifies mould Type obtains the category of employment of the Target Enterprise.
The preferred embodiment and the explanation to institute's application technology principle that above description is only the application.People in the art Member should be appreciated that invention scope involved in the application, however it is not limited to the technology that the specific combination of above-mentioned technical characteristic forms Scheme, while should also cover in the case where not departing from foregoing invention design, it is carried out by above-mentioned technical characteristic or its equivalent feature The other technical solutions for arbitrarily combining and being formed.Such as features described above has similar work(with (but not limited to) disclosed herein The technical solution that the technical characteristic of energy is replaced mutually and formed.

Claims (16)

1. a kind of method for generating information, including:
The company information of Target Enterprise is extracted, wherein, the company information includes enterprise name and business scope information;
Fisrt feature information is extracted from the enterprise name and the business scope information;
Second feature information is extracted from remaining information, wherein, remaining described information is in company information, except enterprise name Claim and the information other than the business scope information;
The fisrt feature information with second feature information is merged, the characteristic information after fusion is input to advance training Industry identification model, obtain the category of employment of the Target Enterprise, wherein, the industry identification model for characteristic feature believe Breath and the correspondence of category of employment.
2. the method according to claim 1 for generating information, wherein, it is described from the enterprise name and the operation Fisrt feature information is extracted in range information, including:
The enterprise name and the business scope information are segmented respectively, determine the term vector of each word after participle;
Keyword is extracted from the enterprise name;
It is each in the term vector of term vector, the keyword to each word in the enterprise name, the business scope information The term vector of word is parsed, and generates fisrt feature information.
3. the method according to claim 2 for generating information, wherein, each word in the enterprise name Term vector, the term vector of the keyword, each word in the business scope information term vector parsed, generation first is special Reference ceases, including:
It will be each in the term vector of each word in the enterprise name, the term vector of the keyword, the business scope information The term vector of word is separately input into advance trained Feature Selection Model, obtain respectively with the enterprise name, the keyword, The corresponding feature vector of the business scope information will be believed with the enterprise name, the keyword, the business scope respectively It ceases corresponding feature vector and is determined as fisrt feature information, wherein, the Feature Selection Model is used to extract text feature.
4. the method according to claim 3 for generating information, wherein, the Feature Selection Model by training in advance The convolutional layer of convolutional neural networks and maximum pond layer composition.
5. the method according to claim 4 for generating information, wherein, the industry identification model is convolution god Full articulamentum through network.
6. the method according to claim 2 for generating information, wherein, it is described respectively to the enterprise name and described Business scope information is segmented, and determines the term vector of each word after participle, including:
The enterprise name and the business scope information are segmented respectively, by each word in the enterprise name and described Each word in business scope information is separately input into advance trained term vector model, obtains each word in the enterprise name The term vector of term vector and each word in the business scope information, wherein, the term vector model be used to generating the word of word to Amount.
7. the method according to claim 1 for generating information, wherein, remaining described information includes the Target Enterprise It is at least one of following:Management position, scale, sets up time, place at registration type;And
It is described that second feature information is extracted from remaining information, including:
Determine the corresponding one-hot coding of each single item in remaining described information;
The corresponding one-hot coding of items in remaining described information is spliced, generates second feature information.
8. it is a kind of for generating the device of information, including:
First extraction unit is configured to the company information of extraction Target Enterprise, wherein, the company information includes enterprise name With business scope information;
Second extraction unit is configured to extract fisrt feature information from the enterprise name and the business scope information;
Third extraction unit is configured to extract second feature information from remaining information, wherein, remaining described information is enterprise Information in information, in addition to the enterprise name and the business scope information;
Input unit is configured to merge the fisrt feature information with second feature information, by the feature after fusion Information is input to industry identification model trained in advance, obtains the category of employment of the Target Enterprise, wherein, the industry identification Model is used for the correspondence of characteristic feature information and category of employment.
9. it is according to claim 8 for generating the device of information, wherein, second extraction unit includes:
Word-dividing mode is configured to respectively segment the enterprise name and the business scope information, after determining participle Each word term vector;
Extraction module is configured to extract keyword from the enterprise name;
Generation module is configured to the term vector to each word in the enterprise name, the term vector of the keyword, the warp The term vector of each word in battalion's range information is parsed, and generates fisrt feature information.
10. it is according to claim 9 for generating the device of information, wherein, the generation module is further configured to:
It will be each in the term vector of each word in the enterprise name, the term vector of the keyword, the business scope information The term vector of word is separately input into advance trained Feature Selection Model, obtain respectively with the enterprise name, the keyword, The corresponding feature vector of the business scope information will be believed with the enterprise name, the keyword, the business scope respectively It ceases corresponding feature vector and is determined as fisrt feature information, wherein, the Feature Selection Model is used to extract text feature.
11. it is according to claim 10 for generating the device of information, wherein, the Feature Selection Model by training in advance Convolutional neural networks convolutional layer and maximum pond layer form.
12. it is according to claim 11 for generating the device of information, wherein, the industry identification model is the convolution The full articulamentum of neural network.
13. it is according to claim 9 for generating the device of information, wherein, the word-dividing mode is further configured to:
The enterprise name and the business scope information are segmented respectively, by each word in the enterprise name and described Each word in business scope information is separately input into advance trained term vector model, obtains each word in the enterprise name The term vector of term vector and each word in the business scope information, wherein, the term vector model be used to generating the word of word to Amount.
14. it is according to claim 8 for generating the device of information, wherein, remaining described information is looked forward to including the target Industry it is at least one of following:Management position, scale, sets up time, place at registration type;And
The third extraction unit includes:
Determining module is configured to determine the corresponding one-hot coding of each single item in remaining described information;
Concatenation module is configured to splice the corresponding one-hot coding of items in remaining described information, and generation second is special Reference ceases.
15. a kind of server, including:
One or more processors;
Storage device, for storing one or more programs,
When one or more of programs are performed by one or more of processors so that one or more of processors are real The now method as described in any in claim 1-7.
16. a kind of computer readable storage medium, is stored thereon with computer program, wherein, when which is executed by processor Realize the method as described in any in claim 1-7.
CN201810045681.1A 2018-01-17 2018-01-17 Method and apparatus for generating information Active CN108171276B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810045681.1A CN108171276B (en) 2018-01-17 2018-01-17 Method and apparatus for generating information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810045681.1A CN108171276B (en) 2018-01-17 2018-01-17 Method and apparatus for generating information

Publications (2)

Publication Number Publication Date
CN108171276A true CN108171276A (en) 2018-06-15
CN108171276B CN108171276B (en) 2019-07-23

Family

ID=62514587

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810045681.1A Active CN108171276B (en) 2018-01-17 2018-01-17 Method and apparatus for generating information

Country Status (1)

Country Link
CN (1) CN108171276B (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109359197A (en) * 2018-10-31 2019-02-19 税友软件集团股份有限公司 A kind of tax type authentication method, device and computer readable storage medium
CN109388712A (en) * 2018-09-21 2019-02-26 平安科技(深圳)有限公司 A kind of trade classification method and terminal device based on machine learning
CN109710906A (en) * 2018-12-06 2019-05-03 深圳市标准技术研究院 Business scope auxiliary makes a report on method, apparatus, terminal device and storage medium
CN109801118A (en) * 2018-12-24 2019-05-24 航天信息股份有限公司 Identify method, apparatus, medium and the equipment of the manufacturing business of designated trade
CN110781955A (en) * 2019-10-24 2020-02-11 中国银联股份有限公司 Method and device for classifying label-free objects and detecting nested codes and computer-readable storage medium
CN110941826A (en) * 2018-09-21 2020-03-31 武汉安天信息技术有限责任公司 Malicious android software detection method and device
CN111104791A (en) * 2019-11-14 2020-05-05 北京金堤科技有限公司 Industry information acquisition method and apparatus, electronic device and medium
CN111126422A (en) * 2018-11-01 2020-05-08 百度在线网络技术(北京)有限公司 Industry model establishing method, industry determining method, industry model establishing device, industry determining equipment and industry determining medium
CN111125550A (en) * 2018-11-01 2020-05-08 百度在线网络技术(北京)有限公司 Interest point classification method, device, equipment and storage medium
CN111242146A (en) * 2018-11-09 2020-06-05 蔚来汽车有限公司 POI information classification based on convolutional neural network
CN111538837A (en) * 2020-04-27 2020-08-14 北京同邦卓益科技有限公司 Method and device for analyzing enterprise operation range information
CN111914090A (en) * 2020-08-18 2020-11-10 生态环境部环境规划院 Method and device for enterprise industry classification identification and characteristic pollutant identification
CN112163153A (en) * 2020-09-30 2021-01-01 深圳前海微众银行股份有限公司 Industry label determination method, device, equipment and storage medium
CN112307199A (en) * 2019-07-14 2021-02-02 阿里巴巴集团控股有限公司 Information identification method, data processing method, device and equipment, information interaction method
CN112487794A (en) * 2019-08-21 2021-03-12 顺丰科技有限公司 Industry classification method and device, terminal equipment and storage medium
CN112487263A (en) * 2020-11-26 2021-03-12 杭州安恒信息技术股份有限公司 Information processing method, system, equipment and computer readable storage medium
CN113869639A (en) * 2021-08-26 2021-12-31 中国环境科学研究院 Yangtze river basin enterprise screening method and device, electronic equipment and storage medium
CN114785410A (en) * 2022-04-25 2022-07-22 贵州电网有限责任公司 Accurate identification system based on optical fiber coding

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102881125B (en) * 2012-09-25 2014-06-18 杭州立高科技有限公司 Alarm monitoring system based on multi-information fusion centralized processing platform
US9058515B1 (en) * 2012-01-12 2015-06-16 Kofax, Inc. Systems and methods for identification document processing and business workflow integration
CN105590102A (en) * 2015-12-30 2016-05-18 中通服公众信息产业股份有限公司 Front car face identification method based on deep learning
CN106372648A (en) * 2016-10-20 2017-02-01 中国海洋大学 Multi-feature-fusion-convolutional-neural-network-based plankton image classification method
CN106779467A (en) * 2016-12-31 2017-05-31 成都数联铭品科技有限公司 Enterprises ' industry categorizing system based on automatic information screening
CN107169036A (en) * 2017-04-19 2017-09-15 畅捷通信息技术股份有限公司 Determine the method and system of the affiliated category of employment of enterprise
CN108241867A (en) * 2016-12-26 2018-07-03 阿里巴巴集团控股有限公司 A kind of sorting technique and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9058515B1 (en) * 2012-01-12 2015-06-16 Kofax, Inc. Systems and methods for identification document processing and business workflow integration
CN102881125B (en) * 2012-09-25 2014-06-18 杭州立高科技有限公司 Alarm monitoring system based on multi-information fusion centralized processing platform
CN105590102A (en) * 2015-12-30 2016-05-18 中通服公众信息产业股份有限公司 Front car face identification method based on deep learning
CN106372648A (en) * 2016-10-20 2017-02-01 中国海洋大学 Multi-feature-fusion-convolutional-neural-network-based plankton image classification method
CN108241867A (en) * 2016-12-26 2018-07-03 阿里巴巴集团控股有限公司 A kind of sorting technique and device
CN106779467A (en) * 2016-12-31 2017-05-31 成都数联铭品科技有限公司 Enterprises ' industry categorizing system based on automatic information screening
CN107169036A (en) * 2017-04-19 2017-09-15 畅捷通信息技术股份有限公司 Determine the method and system of the affiliated category of employment of enterprise

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109388712A (en) * 2018-09-21 2019-02-26 平安科技(深圳)有限公司 A kind of trade classification method and terminal device based on machine learning
CN110941826A (en) * 2018-09-21 2020-03-31 武汉安天信息技术有限责任公司 Malicious android software detection method and device
CN109359197A (en) * 2018-10-31 2019-02-19 税友软件集团股份有限公司 A kind of tax type authentication method, device and computer readable storage medium
CN109359197B (en) * 2018-10-31 2021-01-05 税友软件集团股份有限公司 Tax type authentication method, device and computer readable storage medium
CN111125550A (en) * 2018-11-01 2020-05-08 百度在线网络技术(北京)有限公司 Interest point classification method, device, equipment and storage medium
CN111125550B (en) * 2018-11-01 2023-11-24 百度在线网络技术(北京)有限公司 Point-of-interest classification method, device, equipment and storage medium
CN111126422B (en) * 2018-11-01 2023-10-31 百度在线网络技术(北京)有限公司 Method, device, equipment and medium for establishing industry model and determining industry
CN111126422A (en) * 2018-11-01 2020-05-08 百度在线网络技术(北京)有限公司 Industry model establishing method, industry determining method, industry model establishing device, industry determining equipment and industry determining medium
CN111242146A (en) * 2018-11-09 2020-06-05 蔚来汽车有限公司 POI information classification based on convolutional neural network
CN111242146B (en) * 2018-11-09 2023-08-25 蔚来(安徽)控股有限公司 POI information classification based on convolutional neural network
CN109710906A (en) * 2018-12-06 2019-05-03 深圳市标准技术研究院 Business scope auxiliary makes a report on method, apparatus, terminal device and storage medium
CN109801118A (en) * 2018-12-24 2019-05-24 航天信息股份有限公司 Identify method, apparatus, medium and the equipment of the manufacturing business of designated trade
CN112307199A (en) * 2019-07-14 2021-02-02 阿里巴巴集团控股有限公司 Information identification method, data processing method, device and equipment, information interaction method
CN112487794A (en) * 2019-08-21 2021-03-12 顺丰科技有限公司 Industry classification method and device, terminal equipment and storage medium
CN112487794B (en) * 2019-08-21 2023-09-22 顺丰科技有限公司 Industry classification method, device, terminal equipment and storage medium
CN110781955A (en) * 2019-10-24 2020-02-11 中国银联股份有限公司 Method and device for classifying label-free objects and detecting nested codes and computer-readable storage medium
CN111104791B (en) * 2019-11-14 2024-02-20 北京金堤科技有限公司 Industry information acquisition method and device, electronic equipment and medium
CN111104791A (en) * 2019-11-14 2020-05-05 北京金堤科技有限公司 Industry information acquisition method and apparatus, electronic device and medium
CN111538837A (en) * 2020-04-27 2020-08-14 北京同邦卓益科技有限公司 Method and device for analyzing enterprise operation range information
CN111914090A (en) * 2020-08-18 2020-11-10 生态环境部环境规划院 Method and device for enterprise industry classification identification and characteristic pollutant identification
CN112163153A (en) * 2020-09-30 2021-01-01 深圳前海微众银行股份有限公司 Industry label determination method, device, equipment and storage medium
CN112163153B (en) * 2020-09-30 2024-05-03 深圳前海微众银行股份有限公司 Industry label determining method, device, equipment and storage medium
CN112487263A (en) * 2020-11-26 2021-03-12 杭州安恒信息技术股份有限公司 Information processing method, system, equipment and computer readable storage medium
CN113869639B (en) * 2021-08-26 2023-11-07 中国环境科学研究院 Yangtze river basin enterprise screening method and device, electronic equipment and storage medium
CN113869639A (en) * 2021-08-26 2021-12-31 中国环境科学研究院 Yangtze river basin enterprise screening method and device, electronic equipment and storage medium
CN114785410A (en) * 2022-04-25 2022-07-22 贵州电网有限责任公司 Accurate identification system based on optical fiber coding
CN114785410B (en) * 2022-04-25 2024-02-27 贵州电网有限责任公司 Accurate recognition system based on optical fiber coding

Also Published As

Publication number Publication date
CN108171276B (en) 2019-07-23

Similar Documents

Publication Publication Date Title
CN108171276B (en) Method and apparatus for generating information
CN113326764B (en) Method and device for training image recognition model and image recognition
US20190065506A1 (en) Search method and apparatus based on artificial intelligence
CN108153901A (en) The information-pushing method and device of knowledge based collection of illustrative plates
CN107491534A (en) Information processing method and device
US20180365231A1 (en) Method and apparatus for generating parallel text in same language
CN107220386A (en) Information-pushing method and device
CN108090162A (en) Information-pushing method and device based on artificial intelligence
CN107105031A (en) Information-pushing method and device
CN107168952A (en) Information generating method and device based on artificial intelligence
CN110347940A (en) Method and apparatus for optimizing point of interest label
CN109697641A (en) The method and apparatus for calculating commodity similarity
CN108121800A (en) Information generating method and device based on artificial intelligence
CN109189938A (en) Method and apparatus for updating knowledge mapping
CN108121699A (en) For the method and apparatus of output information
CN108804327A (en) A kind of method and apparatus of automatic Data Generation Test
CN108287927B (en) For obtaining the method and device of information
CN109299477A (en) Method and apparatus for generating text header
CN107145485A (en) Method and apparatus for compressing topic model
CN107943895A (en) Information-pushing method and device
CN109697239A (en) Method for generating the method for graph text information and for generating image data base
CN107526718A (en) Method and apparatus for generating text
CN106919711A (en) The method and apparatus of the markup information based on artificial intelligence
CN108182472A (en) For generating the method and apparatus of information
CN109146152A (en) Incident classification prediction technique and device on a kind of line

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant