CN108171276B - Method and apparatus for generating information - Google Patents

Method and apparatus for generating information Download PDF

Info

Publication number
CN108171276B
CN108171276B CN201810045681.1A CN201810045681A CN108171276B CN 108171276 B CN108171276 B CN 108171276B CN 201810045681 A CN201810045681 A CN 201810045681A CN 108171276 B CN108171276 B CN 108171276B
Authority
CN
China
Prior art keywords
information
word
term vector
enterprise
business scope
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810045681.1A
Other languages
Chinese (zh)
Other versions
CN108171276A (en
Inventor
骆金昌
方军
尹存祥
郑志彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201810045681.1A priority Critical patent/CN108171276B/en
Publication of CN108171276A publication Critical patent/CN108171276A/en
Application granted granted Critical
Publication of CN108171276B publication Critical patent/CN108171276B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0637Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Educational Administration (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Game Theory and Decision Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Health & Medical Sciences (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the present application discloses the method and apparatus for generating information.One specific embodiment of this method includes: to extract the company information of Target Enterprise, wherein the company information includes enterprise name and business scope information;Fisrt feature information is extracted from the enterprise name and the business scope information;Second feature information is extracted from remaining information;The fisrt feature information is merged with second feature information, fused characteristic information is input to industry identification model trained in advance, obtains the category of employment of the Target Enterprise.This embodiment improves the flexibilities that information generates.

Description

Method and apparatus for generating information
Technical field
The invention relates to field of computer technology, and in particular to Internet technical field, it is more particularly, to raw At the method and apparatus of information.
Background technique
With the development of computer technology, analyzed to preferably carry out data relevant to enterprise (such as business risk Analysis, enterprise's map construction etc.), it usually needs enterprise is referred to correct industry, and adds industry label for enterprise.
Existing mode is usually to identify the keyword in company information by way of rule and dictionary pattern matching, is led to It crosses and manually these keywords is mapped in preset trade classification, it usually needs consume larger manpower.
Summary of the invention
The embodiment of the present application proposes the method and apparatus for generating information.
In a first aspect, the embodiment of the present application provides a kind of method for generating information, this method comprises: extracting target The company information of enterprise, wherein company information includes enterprise name and business scope information;Believe from enterprise name and business scope Fisrt feature information is extracted in breath;Second feature information is extracted from remaining information, wherein remaining information is in company information , information in addition to enterprise name and business scope information;Fisrt feature information is merged with second feature information, it will Fused characteristic information is input to industry identification model trained in advance, obtains the category of employment of Target Enterprise, wherein industry Identification model is used for the corresponding relationship of characteristic feature information and category of employment.
In some embodiments, fisrt feature information is extracted from enterprise name and business scope information, comprising: right respectively Enterprise name and business scope information are segmented, the term vector of each word after determining participle;It is extracted from enterprise name crucial Word;To in enterprise name the term vector of each word, the term vector of keyword, each word in business scope information term vector carry out Parsing generates fisrt feature information.
In some embodiments, to the term vector of each word in enterprise name, the term vector of keyword, business scope information In the term vector of each word parsed, generate fisrt feature information, comprising: by the term vector of each word in enterprise name, close The term vector of keyword, each word in business scope information term vector be separately input into Feature Selection Model trained in advance, obtain To feature vector corresponding with enterprise name, keyword, business scope information respectively, will respectively with enterprise name, keyword, warp The corresponding feature vector of battalion's range information is determined as fisrt feature information, wherein Feature Selection Model is for extracting text feature.
In some embodiments, Feature Selection Model is by the convolutional layer of convolutional neural networks trained in advance and maximum pond Layer composition.
In some embodiments, industry identification model is the full articulamentum of convolutional neural networks.
In some embodiments, enterprise name and business scope information are segmented respectively, each word after determining participle Term vector, comprising: enterprise name and business scope information are segmented respectively, by enterprise name each word and manage model It encloses each word in information and is separately input into term vector model trained in advance, obtain the term vector and warp of each word in enterprise name Seek the term vector of each word in range information, wherein term vector model is used to generate the term vector of word.
In some embodiments, remaining information includes at least one of following of Target Enterprise: management position, registration type, Scale sets up time, place;And second feature information is extracted from remaining information, comprising: determine each in remaining information The corresponding one-hot coding of item;The corresponding one-hot coding of items in remaining information is spliced, second feature information is generated.
Second aspect, the embodiment of the present application provide a kind of for generating the device of information, which includes: the first extraction Unit is configured to extract the company information of Target Enterprise, wherein company information includes enterprise name and business scope information; Second extraction unit is configured to extract fisrt feature information from enterprise name and business scope information;Third extraction unit, Be configured to extract second feature information from remaining information, wherein remaining information be it is in company information, except enterprise name and Information other than business scope information;Input unit is configured to merge fisrt feature information with second feature information, Fused characteristic information is input to industry identification model trained in advance, obtains the category of employment of Target Enterprise, wherein row Industry identification model is used for the corresponding relationship of characteristic feature information and category of employment.
In some embodiments, the second extraction unit includes: word segmentation module, is configured to respectively to enterprise name and operation Range information is segmented, the term vector of each word after determining participle;Extraction module is configured to extract from enterprise name and close Keyword;Generation module is configured to the term vector of each word in enterprise name, the term vector of keyword, business scope information In the term vector of each word parsed, generate fisrt feature information.
In some embodiments, generation module is further configured to: by the term vector of each word in enterprise name, key The term vector of word, each word in business scope information term vector be separately input into Feature Selection Model trained in advance, obtain Feature vector corresponding with enterprise name, keyword, business scope information respectively, will respectively with enterprise name, keyword, operation The corresponding feature vector of range information is determined as fisrt feature information, wherein Feature Selection Model is for extracting text feature.
In some embodiments, Feature Selection Model is by the convolutional layer of convolutional neural networks trained in advance and maximum pond Layer composition.
In some embodiments, industry identification model is the full articulamentum of convolutional neural networks.
In some embodiments, word segmentation module is further configured to: respectively to enterprise name and business scope information into Each word in each word and business scope information in enterprise name, is separately input into term vector mould trained in advance by row participle Type obtains the term vector of each word in the term vector and business scope information of each word in enterprise name, wherein term vector model For generating the term vector of word.
In some embodiments, remaining information includes at least one of following of Target Enterprise: management position, registration type, Scale sets up time, place;And third extraction unit comprises determining that module, is configured to determine each in remaining information The corresponding one-hot coding of item;Splicing module is configured to splice the corresponding one-hot coding of items in remaining information, raw At second feature information.
The third aspect, the embodiment of the present application provide a kind of server, comprising: one or more processors;Storage device, For storing one or more programs, when one or more programs are executed by one or more processors, so that one or more Processor realizes the method such as any embodiment in the method for generating information.
Fourth aspect, the embodiment of the present application provide a kind of computer readable storage medium, are stored thereon with computer journey Sequence realizes the method such as any embodiment in the method for generating information when the program is executed by processor.
Method and apparatus provided by the embodiments of the present application for generating information, the enterprise by extracting Target Enterprise believe Breath, to extract fisrt feature information from enterprise name and business scope information and to extract second feature from remaining information The fisrt feature information is then merged with second feature information, fused characteristic information is input in advance by information Trained industry identification model, obtains the category of employment of the Target Enterprise, so as to sufficiently extract the feature in company information Information, and determine based on extracted characteristic information the category of employment of enterprise, it does not need manually to carry out Keywords matching, improve The flexibility that information generates.
Detailed description of the invention
By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the application's is other Feature, objects and advantages will become more apparent upon:
Fig. 1 is that this application can be applied to exemplary system architecture figures therein;
Fig. 2 is the flow chart according to one embodiment of the method for generating information of the application;
Fig. 3 is the schematic diagram according to an application scenarios of the method for generating information of the application;
Fig. 4 is the flow chart according to another embodiment of the method for generating information of the application;
Fig. 5 is the structural schematic diagram according to one embodiment of the device for generating information of the application;
Fig. 6 is adapted for the structural schematic diagram for the computer system for realizing the server of the embodiment of the present application.
Specific embodiment
The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to Convenient for description, part relevant to related invention is illustrated only in attached drawing.
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
Fig. 1 is shown can be using the application for generating the method for information or the example of the device for generating information Property system architecture 100.
As shown in Figure 1, system architecture 100 may include terminal device 101,102,103, network 104 and server 105. Network 104 between terminal device 101,102,103 and server 105 to provide the medium of communication link.Network 104 can be with Including various connection types, such as wired, wireless communication link or fiber optic cables etc..
User can be used terminal device 101,102,103 and be interacted by network 104 with server 105, to receive or send out Send message etc..Various telecommunication customer end applications can be installed, such as web browser is answered on terminal device 101,102,103 With, searching class application, instant messaging tools, mailbox client, social platform software etc..
Terminal device 101,102,103 can be the various electronic equipments with display screen and supported web page browsing, packet Include but be not limited to smart phone, tablet computer, pocket computer on knee and desktop computer etc..
Server 105 can be to provide the server of various services, such as carry out storage and management to company information and deposit Store up server.The company information that storage server can upload terminal device 101,102,103 is stored, is managed, is analyzed Deng processing, and generate processing result (such as enterprise sort).
It should be noted that the method provided by the embodiment of the present application for generating information is generally held by server 105 Row, correspondingly, the device for generating information is generally positioned in server 105.
It should be understood that the number of terminal device, network and server in Fig. 1 is only schematical.According to realization need It wants, can have any number of terminal device, network and server.
With continued reference to Fig. 2, the process of one embodiment of the method for generating information according to the application is shown 200.The method for generating information, comprising the following steps:
Step 201, the company information of Target Enterprise is extracted.
In the present embodiment, the method for generating information runs electronic equipment (such as service shown in FIG. 1 thereon Device 105) in can be previously stored with the company information of a large amount of enterprise, above-mentioned electronic equipment can therefrom extract Target Enterprise Company information.Wherein, above-mentioned Target Enterprise, which can be, not yet marks the enterprise of category of employment or technical staff preassigned The enterprise of category of employment to be determined.The company information of above-mentioned Target Enterprise can be comprising relevant to above-mentioned Target Enterprise various The text of information, for example, information relevant to above-mentioned Target Enterprise may include that (such as " certain International Technology is (deep for enterprise name Ditch between fields) Co., Ltd "), business scope information (such as " electric type product "), enterprise personnel constitute, enterprises service crowd etc..
It should be noted that the company information of above-mentioned Target Enterprise is also possible to terminal device in application scenes (such as terminal device shown in FIG. 1 101,102,103) is sent to above-mentioned electronics by wired connection or radio connection In equipment.It should be pointed out that above-mentioned radio connection can include but is not limited to 3G/4G connection, WiFi connection, bluetooth Connection, WiMAX connection, Zigbee connection, UWB (ultra wideband) connection and other currently known or exploitations in the future Radio connection.
Step 202, fisrt feature information is extracted from enterprise name and business scope information.
In the present embodiment, above-mentioned electronic equipment can use various text features from above-mentioned enterprise name and Fisrt feature information is extracted in above-mentioned business scope information.Wherein, above-mentioned fisrt feature information can be for characterizing above-mentioned enterprise The information of industry title and the text feature in above-mentioned business scope information, such as characterized in vector form.Above-mentioned text Feature can be the various information characterized for the fundamental (such as semanteme, keyword, Feature Words etc.) to text.
In some optional implementations of the present embodiment, the analysis mode of the content of above-mentioned Webpage can be Statistical analysis mode.For example, the frequency of occurrences of each word present in above content can be counted and be sorted;It Afterwards, it then chooses the frequency of occurrences and sorts forward one or more words as keyword to be extracted;It finally can use various Term vector generation method (such as term vector calculating instrument word2vec using open source) generates the term vector of keyword, will give birth to At term vector be determined as fisrt feature information.
In some optional implementations of the present embodiment, above-mentioned electronic equipment can use the spy of the text based on statistics It levies extracting method and extracts fisrt feature information.As an example, can be first to above-mentioned enterprise name and above-mentioned business scope information The processing such as full cutting method is carried out, above-mentioned enterprise name and above-mentioned business scope information are divided into word.It then, can be to gained The word arrived carries out importance calculating (for example, by using reverse document-frequency method (the Term Frequency-Inverse of word frequency- Document Frequency, TF-IDF)), keyword is obtained based on the result that importance calculates.The reverse file frequency of word frequency- The main thought of rate method is, if the frequency (Term Frequency, TF) that some word or phrase occur in an article Height, and seldom occur in other articles, then it is assumed that this word or phrase have good class discrimination ability, are adapted to Classification.And reverse document-frequency (Inverse Document Frequency, IDF) is primarily referred to as, if comprising some word or The document of phrase is fewer, then IDF is bigger, then illustrates that the word or phrase have good class discrimination ability.Word is used as a result, Frequently-reverse document-frequency method, can calculate the importance of some word or phrase inside certain article.Finally, can use Various term vector generation methods (such as term vector calculating instrument word2vec using open source) generate the term vector of keyword, will Term vector generated is determined as fisrt feature information.It should be noted that the reverse file frequency of above-mentioned full cutting method, word frequency- Rate method is the well-known technique studied and applied extensively at present, and details are not described herein.
In some optional implementations of the present embodiment, it is special that above-mentioned electronic equipment can use semantic-based text It levies extracting method and extracts fisrt feature information.As an example, can be first respectively to above-mentioned enterprise name and above-mentioned business scope Information is segmented, the term vector of each word after participle is determined using various term vector generation methods;It then, can be to being extracted Term vector parsed, generate fisrt feature information.Herein, it can use in advance trained Feature Selection Model to being extracted Term vector parsed, to extract fisrt feature information.Feature Selection Model, which can be, utilizes machine learning method and training Sample, to the various existing model (such as Recognition with Recurrent Neural Network (Recurrent that Text character extraction function may be implemented Neural Network, RNN), shot and long term memory network (Long Short-Term Memory, LSTM) etc.) carried out supervision Training obtains.In practice, the characteristic information that features described above extracts model output can be indicated in vector form.
In some optional implementations of the present embodiment, it is special that above-mentioned electronic equipment also extracts first as follows Reference breath:
The first step can use various participle modes (such as Forward Maximum Method segmenting method, reverse maximum matching participle Method etc.) above-mentioned enterprise name and above-mentioned business scope information are segmented respectively, and utilize various term vector generating modes The term vector (such as term vector calculating instrument word2vec using open source) of each word after determining participle.
Second step can extract keyword from above-mentioned enterprise name.Herein, above-mentioned keyword can be for enterprise The word that category of employment plays a significant role.For example, enterprise name is " certain International Technology (Shenzhen) Co., Ltd ", then keyword can To be " science and technology ".In practice, can using in preset keyword set preset keyword carry out string matching by the way of Extract the keyword in above-mentioned enterprise name.
Third step, can term vector, the term vector of above-mentioned keyword, above-mentioned operation to each word in above-mentioned enterprise name The term vector of each word in range information is parsed, and fisrt feature information is generated.Herein, various preset modes be can use Term vector is parsed.As an example, term vector group generated can be combined into matrix first, it then, can be to the square Battle array carries out the processing such as convolution, down-sampled, and the processing such as above-mentioned convolution, down-sampled can execute repeatedly, by finally obtained vector As first eigenvector.
In some optional implementations of the present embodiment, above-mentioned third step, to each word in above-mentioned enterprise name Term vector, the term vector of above-mentioned keyword, each word in above-mentioned business scope information term vector parsed, it is special to generate first Reference breath, can carry out in the following manner: above-mentioned electronic equipment can by the term vector of each word in above-mentioned enterprise name, on State the term vector of keyword, the term vector of each word in above-mentioned business scope information is separately input into feature extraction trained in advance Model obtains feature vector corresponding with above-mentioned enterprise name, above-mentioned keyword, above-mentioned business scope information respectively, and will divide Feature vector not corresponding with above-mentioned enterprise name, above-mentioned keyword, above-mentioned business scope information is determined as fisrt feature information. Wherein, features described above extracts model and can be used for extracting text feature.Herein, Feature Selection Model, which can be, utilizes machine learning Method and training sample, to various existing model (such as the Recognition with Recurrent Neural Network that Text character extraction function may be implemented (Recurrent neural Network, RNN), shot and long term memory network (Long Short-Term Memory, LSTM), by Limit Boltzmann machine (Restricted Boltzmann Machine, RBM) etc.) it carries out Training and obtains.
Step 203, second feature information is extracted from remaining information.
In the present embodiment, above-mentioned electronic equipment can by it is in above-mentioned company information, remove enterprise name and business scope Information other than information is known as remaining information, and second feature information is extracted from remaining above-mentioned information.Herein, above-mentioned electronic equipment It can use various modes and extract second feature information.As an example, can use the text feature based on statistics Extract second feature information.The processing such as full cutting method is carried out to remaining information first, remaining information is divided into word;Then adopt Importance calculating is carried out to obtained word with word frequency-reverse document-frequency method, is obtained based on the result that importance calculates Keyword;Finally, generating the term vector of keyword, term vector generated is determined as second feature information.It needs to illustrate It is that above-mentioned second feature information can be indicated in vector form.
In some optional implementations of the present embodiment, remaining above-mentioned information may include above-mentioned Target Enterprise with At least one of lower: management position, scale, sets up time, place (such as province, city etc.) at registration type.Above-mentioned electronic equipment Second feature information: the first step can be extracted in accordance with the following steps, can determine that each single item in remaining above-mentioned information is corresponding Solely heat (One-Hot) coding.In practice, one-hot coding is also known as an efficient coding, and method is using N (N is positive integer) position Status register encodes N number of state, and each state has its independent register-bit, and when any, In only one effectively.For example, encoding to six states: natural order code is 000,001,010,011,100,101, then One-hot coding can be 000001,000010,000100,001000,010000,100000.In general, one-hot coding can be used to The discrete features for handling text, also play the effect of augmented features, one-hot coding can be with the shape of vector to a certain extent Formula indicates.It should be noted that above-mentioned one-hot coding method is the well-known technique studied and applied extensively at present, herein no longer It repeats.The corresponding one-hot coding of items in remaining above-mentioned information can be spliced, generate second feature letter by second step Breath, that is, regard spliced coding as second feature information (vector).For example, province one shares 23, one 23 can be constructed The vector of dimension, one province of every one-dimensional representation, the element in the corresponding vector in province where above-mentioned Target Enterprise is 1, in vector Other elements be 0.
Step 204, fisrt feature information is merged with second feature information, fused characteristic information is input to Trained industry identification model in advance, obtains the category of employment of Target Enterprise.
In the present embodiment, above-mentioned electronic equipment can first melt fisrt feature information and second feature information It closes.Herein, since above-mentioned fisrt feature information and above-mentioned second feature information can be indicated in vector form, because This, the mode that can use vector splicing merges above-mentioned fisrt feature information and above-mentioned second feature information.Then, on Fused characteristic information can be input to industry identification model trained in advance by stating electronic equipment, obtain the row of Target Enterprise Industry classification.Wherein, above-mentioned industry identification model can be used for the corresponding relationship of characteristic feature information and category of employment.As showing Example, above-mentioned industry identification model can be technical staff counted and pre-established based on mass data, each characteristic information and row The mapping table of industry classification.
In some optional implementations of the present embodiment, above-mentioned industry identification model can train as follows It obtains: it is possible, firstly, to extract preset training sample, wherein above-mentioned training sample may include the company information of multiple enterprises Sample can also include the corresponding enterprise sort mark of each company information sample.It then, can be from each company information Fisrt feature information, second feature information are extracted in sample, herein, the mode of fisrt feature information and second feature information can be with Mode used in step 202 and step 203 is respectively adopted, details are not described herein again.It later, can will be from each company information The fisrt feature information and second feature information extracted in sample are merged, using machine learning method, by fused spy Reference breath is as input, by the corresponding enterprise sort mark of the company information sample as output, to existing achievable classification Model (such as the model-naive Bayesian (Naive Bayesian Model, NBM), support vector machines (Support of function Vector Machine, SVM) or classification function (such as softmax function etc.) carry out Training, training after model or Classification function is determined as industry identification model.
It, can after the category of employment for obtaining above-mentioned Target Enterprise in some optional implementations of the present embodiment The trade information of the above-mentioned Target Enterprise stored is added profession identity, the sector mark can serve to indicate that above-mentioned target The category of employment of enterprise.In practice, addition profession identity can construct in order to subsequent progress business risk analysis, enterprise's map Deng.
With continued reference to the signal that Fig. 3, Fig. 3 are according to the application scenarios of the method for generating information of the present embodiment Figure.In the application scenarios of Fig. 3, the storage server for carrying out storage and management to company information is stored from local first Company information list in be extracted the company information 301 of some enterprise (comprising enterprise name 302,303 and of business scope information Remaining information 304 in addition to this two), then the storage server is mentioned from enterprise name 302 and business scope information 303 Fisrt feature information 305 is taken, second feature information 306 is extracted from remaining information 304 later, finally by the fisrt feature information It is merged with second feature information, fused characteristic information 307 is input to industry identification model trained in advance, is obtained The category of employment 308 of the Target Enterprise.
The method provided by the above embodiment of the application, by extract Target Enterprise company information, so as to from enterprise name Claim and business scope information in extract fisrt feature information and extract second feature information from remaining information, then by this One characteristic information is merged with second feature information, and fused characteristic information is input to industry trained in advance and identifies mould Type obtains the category of employment of the Target Enterprise, so as to sufficiently extract the characteristic information in company information, and is based on being extracted Characteristic information determine the category of employment of enterprise, do not need manually to carry out Keywords matching, improve the flexibility of information generation.
With further reference to Fig. 4, it illustrates the processes 400 of another embodiment of the method for generating information.The use In the process 400 for the method for generating information, comprising the following steps:
Step 401, the company information of Target Enterprise is extracted.
In the present embodiment, the method for generating information runs electronic equipment (such as service shown in FIG. 1 thereon Device 105) in can be previously stored with the company information of a large amount of enterprise, above-mentioned electronic equipment can therefrom extract Target Enterprise Company information.Wherein, the company information of above-mentioned Target Enterprise can be comprising various information relevant to above-mentioned Target Enterprise Text, for example, above-mentioned company information may include enterprise name and business scope information.
Step 402, enterprise name and business scope information are segmented respectively, by enterprise name each word and operation Each word in range information is separately input into advance trained term vector model, obtain each word in enterprise name term vector and The term vector of each word in business scope information.
In the present embodiment, above-mentioned electronic equipment can use various participle modes (such as Forward Maximum Method participle side Method, reverse maximum match segmentation etc.) above-mentioned enterprise name and above-mentioned business scope information are segmented respectively, it will be above-mentioned Each word in each word and above-mentioned business scope information in enterprise name is separately input into term vector model trained in advance, obtains The term vector of the term vector of each word in above-mentioned enterprise name and each word in above-mentioned business scope information.Wherein, upper predicate to Amount model can be used for generating the term vector of word.Above-mentioned term vector model can be using machine learning method, based on by a large amount of Enterprise name and the training sample that constitutes of the business scope information model that can be used for generating term vector to existing (such as increase income Term vector calculating instrument word2vec used in model) carry out Training and obtain.Use training sample training Term vector model afterwards, due to carrying out model training using targetedly training sample, thus, than random initialization vector or adopt It is more preferable with the effect for not limiting the term vector model after field (such as text unrelated with enterprise) is trained.
Step 403, keyword is extracted from enterprise name.
In the present embodiment, above-mentioned electronic equipment can extract keyword from above-mentioned enterprise name.Herein, above-mentioned key Word can be the word played a significant role for the category of employment of enterprise.For example, enterprise name is that " certain International Technology (Shenzhen) has Limit company ", then keyword can be " science and technology ".In practice, can using with the preset keyword in preset keyword set into The mode of line character String matching extracts the keyword in above-mentioned enterprise name.
Step 404, by the term vector of each word in above-mentioned enterprise name, above-mentioned keyword term vector, above-mentioned manage model The term vector for enclosing each word in information is separately input into advance trained Feature Selection Model, obtain respectively with above-mentioned enterprise's name Title, above-mentioned keyword, the corresponding feature vector of above-mentioned business scope information, will respectively with above-mentioned enterprise name, above-mentioned keyword, The corresponding feature vector of above-mentioned business scope information is determined as fisrt feature information.
In the present embodiment, above-mentioned electronic equipment can be by the term vector of each word in above-mentioned enterprise name, above-mentioned key The term vector of word, each word in above-mentioned business scope information term vector be separately input into Feature Selection Model trained in advance, Obtain feature vector corresponding with above-mentioned enterprise name, above-mentioned keyword, above-mentioned business scope information respectively, and will respectively with it is upper It states enterprise name, above-mentioned keyword, the corresponding feature vector of above-mentioned business scope information and is determined as fisrt feature information.Wherein, Features described above extract model can by train in advance convolutional neural networks (Convolutional Neural Network, CNN convolutional layer and maximum pond layer composition).Herein, above-mentioned convolutional neural networks may include one or more convolutional layers, most Great Chiization layer, vector splicing layer and full articulamentum (fully connected layers, FC).Convolutional layer can be used for input Matrix to the convolutional layer carries out convolutional calculation, also may be implemented to carry out the matrix of input feature extraction and down-sampled (downsample);Maximum pond layer can be used for carrying out down-sampled and output vector to the matrix of input;Vector splices layer can Spliced with each vector for exporting maximum pond layer and individually entering to the other feature vector of this layer, will be spelled Vector after connecing is input to full articulamentum;The differentiation of category of employment may be implemented in full articulamentum.In practice, full articulamentum is entire Play the role of " classifier " in convolutional neural networks.Full articulamentum can be by the Feature Mapping learnt to sample labeling sky Between.Herein, matrix, above-mentioned keyword that above-mentioned electronic equipment can form the term vector of each word in above-mentioned enterprise name The matrix that term vector forms, the matrix that the term vector of each word in above-mentioned business scope information forms are separately input into above-mentioned convolution Neural network (is input to a layer convolutional layer of convolutional neural networks), by the maximum pond layer institute of above-mentioned convolutional neural networks The vector exported respectively respectively as feature corresponding with above-mentioned enterprise name, above-mentioned keyword, above-mentioned business scope information to Amount, and feature vector corresponding with above-mentioned enterprise name, above-mentioned keyword, above-mentioned business scope information respectively is determined as first Characteristic information.
In practice, above-mentioned convolutional neural networks can be trained as follows and be obtained: it is possible, firstly, to extract it is preset, Training sample for training convolutional neural networks, wherein above-mentioned training sample may include the company information sample of multiple enterprises This, can also include the corresponding enterprise sort mark of each company information sample.Then, for each company information sample This, can respectively in the company information sample enterprise name and business scope information segment, will be in enterprise name Each word in each word and business scope information is separately input into term vector model trained in advance, obtains each word in enterprise name Term vector and business scope information in each word term vector.It later, can be from the enterprise in each company information sample Title extracts keyword.Then, it for each company information sample, can be mentioned from remaining information of the company information sample Take second feature information (for example, the can be obtained by the way of being spliced every one-hot coding in remaining information Two characteristic informations, and second feature information is indicated in the form of vectors).Finally, can will be in each company information sample Enterprise name in the term vector of each word, the term vector of keyword, each word in business scope information term vector as pre- The input (being input to a layer convolutional layer of convolutional neural networks) for the convolutional neural networks first established, by the company information sample The vector of this corresponding second feature information input to the convolutional neural networks pre-established splices layer, by the company information sample Corresponding enterprise sort marks the output as the convolutional neural networks pre-established, is built in advance using machine learning method to this Vertical convolutional neural networks carry out Training, the convolutional neural networks after being trained.Herein, by extracting enterprise name In keyword, then the term vector of keyword a part as input is input to the mode that model is trained, can given The clearer guidance of model is given, the accuracy of Model checking is improved;Also, convolutional neural networks have trained and forecasting efficiency High advantage has good nonlinear fitting ability, it is ensured that the precision of trade classification.
Step 405, the corresponding one-hot coding of each single item in remaining information is determined.
In the present embodiment, above-mentioned electronic equipment can determine the corresponding one-hot coding of each single item in remaining information, In, remaining above-mentioned information can be it is in the company information of above-mentioned Target Enterprise, except above-mentioned enterprise name and above-mentioned business scope Information other than information.Remaining above-mentioned information may include at least one of following of above-mentioned Target Enterprise: management position, registration class Type, sets up time, place (such as province, city etc.) at scale.Above-mentioned electronic equipment can determine every in remaining above-mentioned information One corresponding only hot (One-Hot) coding.In practice, one-hot coding can be indicated in vector form.
Step 406, the corresponding one-hot coding of items in remaining information is spliced, generates second feature information.
In the present embodiment, above-mentioned electronic equipment can carry out the corresponding one-hot coding of items in remaining above-mentioned information Splicing generates second feature information, that is, by spliced coding (form of vector can be used to be indicated) as second feature Information.
Step 407, fisrt feature information is merged with second feature information, fused characteristic information is input to Trained industry identification model in advance, obtains the category of employment of Target Enterprise.
In the present embodiment, above-mentioned electronic equipment can first melt fisrt feature information and second feature information It closes.Herein, since above-mentioned fisrt feature information and above-mentioned second feature information can be indicated in vector form, because This, the mode that can use vector splicing merges above-mentioned fisrt feature information and above-mentioned second feature information.Then, on Fused characteristic information can be input to industry identification model trained in advance by stating electronic equipment, obtain the row of Target Enterprise Industry classification.Wherein, above-mentioned industry identification model can be used for the corresponding relationship of characteristic feature information and category of employment.As showing Example, above-mentioned industry identification model can be technical staff counted and pre-established based on mass data, each characteristic information and row The mapping table of industry classification.It should be noted that above-mentioned, merge by fisrt feature information and second feature information can be with It is executed by the vector splicing layer of above-mentioned convolutional neural networks trained in advance, above-mentioned industry identification model can be above-mentioned preparatory instruction The full articulamentum of experienced convolutional neural networks, full articulamentum can use classification function (such as softmax function) to industry class Do not judged.In practice, full articulamentum can export the probability of Target Enterprise input various industries classification, at this point it is possible to will The corresponding category of employment of maximum probability value is determined as the category of employment of the Target Enterprise;In addition, full articulamentum can also be directly defeated The corresponding category of employment of maximum probability value out.
It, can after the category of employment for obtaining above-mentioned Target Enterprise in some optional implementations of the present embodiment The trade information of the above-mentioned Target Enterprise stored is added profession identity, the sector mark can serve to indicate that above-mentioned target The category of employment of enterprise.In practice, addition profession identity can construct in order to subsequent progress business risk analysis, enterprise's map Deng.
Figure 4, it is seen that the method for generating information compared with the corresponding embodiment of Fig. 2, in the present embodiment Process 400 highlight carry out feature extraction using convolutional neural networks trained in advance, the step of category of employment differentiates, due to Convolutional neural networks have the advantages that trained and forecasting efficiency is high, have good nonlinear fitting ability, thus improve row The precision of industry classification.In addition, process 400 also highlight by extract enterprise name in keyword, then by the word of keyword to The step of amount is trained as a part of the input of model with category of employment differentiation, thus can use keyword can be to The clearer guidance of model is given, the accuracy of Model checking is improved.In addition, process 400 is also highlighted using based on by a large amount of Enterprise name and the vector model trained of training sample that constitutes of business scope information the step of carrying out term vector generation, by In carrying out model training using targetedly training sample, thus, than random initialization vector or using do not limit field (such as The text unrelated with enterprise) training after term vector model effect it is more preferable.The scheme of the present embodiment description can be improved as a result, The accuracy that category of employment generates.
With further reference to Fig. 5, as the realization to method shown in above-mentioned each figure, this application provides one kind for generating letter One embodiment of the device of breath, the Installation practice is corresponding with embodiment of the method shown in Fig. 2, which can specifically answer For in various electronic equipments.
As shown in figure 5, being used to generate the device 500 of information described in the present embodiment includes: the first extraction unit 501, match Set the company information for extracting Target Enterprise, wherein above-mentioned company information includes enterprise name and business scope information;Second Extraction unit 502 is configured to extract fisrt feature information from above-mentioned enterprise name and above-mentioned business scope information;Third mentions Unit 503 is taken, is configured to extract second feature information from remaining information, wherein remaining above-mentioned information is in company information , information in addition to above-mentioned enterprise name and above-mentioned business scope information;Input unit 504 is configured to above-mentioned first Characteristic information is merged with second feature information, and fused characteristic information is input to industry trained in advance and identifies mould Type obtains the category of employment of above-mentioned Target Enterprise, wherein above-mentioned industry identification model is used for characteristic feature information and category of employment Corresponding relationship.
In some optional implementations of the present embodiment, above-mentioned second extraction unit 502 may include word segmentation module, Extraction module and generation module (not shown).Wherein, above-mentioned word segmentation module may be configured to respectively to above-mentioned enterprise's name Claim and above-mentioned business scope information is segmented, the term vector of each word after determining participle.Said extracted module can configure use In extracting keyword from above-mentioned enterprise name.Above-mentioned generation module may be configured to each word in above-mentioned enterprise name Term vector, the term vector of above-mentioned keyword, each word in above-mentioned business scope information term vector parsed, it is special to generate first Reference breath.
In some optional implementations of the present embodiment, above-mentioned generation module can be further configured to: will be upper State the term vector of each word in enterprise name, the term vector of above-mentioned keyword, each word in above-mentioned business scope information word to Amount is separately input into advance trained Feature Selection Model, obtain respectively with above-mentioned enterprise name, above-mentioned keyword, above-mentioned operation The corresponding feature vector of range information, will be corresponding with above-mentioned enterprise name, above-mentioned keyword, above-mentioned business scope information respectively Feature vector is determined as fisrt feature information, wherein features described above extracts model for extracting text feature.
In some optional implementations of the present embodiment, features described above extracts model can be by the convolution trained in advance The convolutional layer of neural network and maximum pond layer composition.
In some optional implementations of the present embodiment, above-mentioned industry identification model can be above-mentioned convolutional Neural net The full articulamentum of network.
In some embodiments, above-mentioned word segmentation module can be further configured to respectively to above-mentioned enterprise name and above-mentioned Business scope information is segmented, and each word in each word and above-mentioned business scope information in above-mentioned enterprise name is inputted respectively To term vector model trained in advance, obtain in the term vector and above-mentioned business scope information of each word in above-mentioned enterprise name The term vector of each word, wherein above-mentioned term vector model is used to generate the term vector of word.
In some optional implementations of the present embodiment, remaining above-mentioned information may include above-mentioned Target Enterprise with At least one of lower: management position, scale, sets up time, place at registration type.Above-mentioned third extraction unit 503 may include true Cover half block and splicing module (not shown).Wherein, above-mentioned determining module may be configured to determine in remaining above-mentioned information The corresponding one-hot coding of each single item.Above-mentioned splicing module may be configured to the items in remaining above-mentioned information are corresponding solely Heat coding is spliced, and second feature information is generated.
The device provided by the above embodiment of the application is believed by the enterprise that the first extraction unit 501 extracts Target Enterprise Breath, so that the second extraction unit 502 extracts fisrt feature information and third extraction from enterprise name and business scope information Unit 503 extracts second feature information from remaining information, and then input unit 504 is by the fisrt feature information and second feature Information is merged, and fused characteristic information is input to industry identification model trained in advance, obtains the Target Enterprise Category of employment so as to sufficiently extract the characteristic information in company information, and determines enterprise based on extracted characteristic information Category of employment, do not need manually to carry out Keywords matching, improve information generation flexibility.
Below with reference to Fig. 6, it illustrates the computer systems 600 for the server for being suitable for being used to realize the embodiment of the present application Structural schematic diagram.Server shown in Fig. 6 is only an example, should not function and use scope band to the embodiment of the present application Carry out any restrictions.
As shown in fig. 6, computer system 600 includes central processing unit (CPU) 601, it can be read-only according to being stored in Program in memory (ROM) 602 or be loaded into the program in random access storage device (RAM) 603 from storage section 608 and Execute various movements appropriate and processing.In RAM 603, also it is stored with system 600 and operates required various programs and data. CPU 601, ROM 602 and RAM 603 are connected with each other by bus 604.Input/output (I/O) interface 605 is also connected to always Line 604.
I/O interface 605 is connected to lower component: the importation 606 including keyboard, mouse etc.;It is penetrated including such as cathode The output par, c 607 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.;Storage section 608 including hard disk etc.; And the communications portion 609 of the network interface card including LAN card, modem etc..Communications portion 609 via such as because The network of spy's net executes communication process.Driver 610 is also connected to I/O interface 605 as needed.Detachable media 611, such as Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 610, in order to read from thereon Computer program be mounted into storage section 608 as needed.
Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium On computer program, which includes the program code for method shown in execution flow chart.In such reality It applies in example, which can be downloaded and installed from network by communications portion 609, and/or from detachable media 611 are mounted.When the computer program is executed by central processing unit (CPU) 601, limited in execution the present processes Above-mentioned function.It should be noted that computer-readable medium described herein can be computer-readable signal media or Computer readable storage medium either the two any combination.Computer readable storage medium for example can be --- but Be not limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination. The more specific example of computer readable storage medium can include but is not limited to: have one or more conducting wires electrical connection, Portable computer diskette, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type may be programmed read-only deposit Reservoir (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory Part or above-mentioned any appropriate combination.In this application, computer readable storage medium, which can be, any include or stores The tangible medium of program, the program can be commanded execution system, device or device use or in connection.And In the application, computer-readable signal media may include in a base band or the data as the propagation of carrier wave a part are believed Number, wherein carrying computer-readable program code.The data-signal of this propagation can take various forms, including but not It is limited to electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be computer Any computer-readable medium other than readable storage medium storing program for executing, the computer-readable medium can send, propagate or transmit use In by the use of instruction execution system, device or device or program in connection.Include on computer-readable medium Program code can transmit with any suitable medium, including but not limited to: wireless, electric wire, optical cable, RF etc., Huo Zheshang Any appropriate combination stated.
Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the application, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of the module, program segment or code include one or more use The executable instruction of the logic function as defined in realizing.It should also be noted that in some implementations as replacements, being marked in box The function of note can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are actually It can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it to infuse Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction Combination realize.
Being described in unit involved in the embodiment of the present application can be realized by way of software, can also be by hard The mode of part is realized.Described unit also can be set in the processor, for example, can be described as: a kind of processor packet Include the first extraction unit, the first extraction unit, the first extraction unit and input unit.Wherein, the title of these units is at certain In the case of do not constitute restriction to the unit itself, for example, the first extraction unit is also described as " extracting Target Enterprise Company information unit ".
As on the other hand, present invention also provides a kind of computer-readable medium, which be can be Included in device described in above-described embodiment;It is also possible to individualism, and without in the supplying device.Above-mentioned calculating Machine readable medium carries one or more program, when said one or multiple programs are executed by the device, so that should Device: the company information of Target Enterprise is extracted, wherein the company information includes enterprise name and business scope information;From the enterprise Fisrt feature information is extracted in industry title and the business scope information;Second feature information is extracted from remaining information;By this One characteristic information is merged with second feature information, and fused characteristic information is input to industry trained in advance and identifies mould Type obtains the category of employment of the Target Enterprise.
Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.Those skilled in the art Member is it should be appreciated that invention scope involved in the application, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic Scheme, while should also cover in the case where not departing from foregoing invention design, it is carried out by above-mentioned technical characteristic or its equivalent feature Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed herein Can technical characteristic replaced mutually and the technical solution that is formed.

Claims (14)

1. a kind of method for generating information, comprising:
Extract the company information of Target Enterprise, wherein the company information includes enterprise name and business scope information;
Fisrt feature information is extracted from the enterprise name and the business scope information, wherein described from enterprise's name Claim and extract fisrt feature information in the business scope information, comprising: respectively to the enterprise name and the business scope Information is segmented, the term vector of each word after determining participle;Keyword is extracted from the enterprise name;To enterprise's name The term vector of each word in title, the term vector of the keyword, each word in the business scope information term vector solved Analysis generates fisrt feature information;
Wherein, described that the term vector of each word in the enterprise name, the term vector of the keyword, the business scope are believed The term vector of each word in breath is parsed, generate fisrt feature information, comprising: by the word of each word in the enterprise name to Amount, the term vector of the keyword, each word in the business scope information term vector be separately input into spy trained in advance Sign extracts model, wherein the Feature Selection Model is for extracting text feature;
Second feature information is extracted from remaining information, wherein remaining described information be it is in company information, except enterprise name Claim and the information other than the business scope information;
The fisrt feature information is merged with second feature information, fused characteristic information is input to preparatory training Industry identification model, obtain the category of employment of the Target Enterprise, wherein the industry identification model for characteristic feature believe The corresponding relationship of breath and category of employment.
2. the method according to claim 1 for generating information, wherein each word in the enterprise name Term vector, the term vector of the keyword, each word in the business scope information term vector parsed, it is special to generate first Reference breath, further includes:
Obtain feature vector corresponding with the enterprise name, the keyword, the business scope information respectively, will respectively with The enterprise name, the keyword, the corresponding feature vector of the business scope information are determined as fisrt feature information.
3. the method according to claim 1 for generating information, wherein the Feature Selection Model by training in advance The convolutional layer of convolutional neural networks and maximum pond layer composition.
4. the method according to claim 3 for generating information, wherein the industry identification model is the convolution mind Full articulamentum through network.
5. the method according to claim 1 for generating information, wherein described respectively to the enterprise name and described Business scope information is segmented, the term vector of each word after determining participle, comprising:
The enterprise name and the business scope information are segmented respectively, by each word in the enterprise name and described Each word in business scope information is separately input into term vector model trained in advance, obtains each word in the enterprise name The term vector of term vector and each word in the business scope information, wherein the term vector model be used to generate the word of word to Amount.
6. the method according to claim 1 for generating information, wherein remaining described information includes the Target Enterprise It is at least one of following: management position, scale, sets up time, place at registration type;And
It is described that second feature information is extracted from remaining information, comprising:
Determine the corresponding one-hot coding of each single item in remaining described information;
The corresponding one-hot coding of items in remaining described information is spliced, second feature information is generated.
7. a kind of for generating the device of information, comprising:
First extraction unit is configured to extract the company information of Target Enterprise, wherein the company information includes enterprise name With business scope information;
Second extraction unit is configured to extract fisrt feature information from the enterprise name and the business scope information, Wherein, second extraction unit includes: word segmentation module, is configured to respectively believe the enterprise name and the business scope Breath is segmented, the term vector of each word after determining participle;Extraction module is configured to extract from the enterprise name crucial Word;Generation module is configured to the term vector to each word in the enterprise name, the term vector of the keyword, the warp The term vector of each word in battalion's range information is parsed, and fisrt feature information is generated;
Wherein, the generation module is further configured to: by the term vector of each word in the enterprise name, the keyword Term vector, each word in the business scope information term vector be separately input into Feature Selection Model trained in advance, In, the Feature Selection Model is for extracting text feature;
Third extraction unit is configured to extract second feature information from remaining information, wherein remaining described information is enterprise Information in information, in addition to the enterprise name and the business scope information;
Input unit is configured to merge the fisrt feature information with second feature information, by fused feature Information input obtains the category of employment of the Target Enterprise to industry identification model trained in advance, wherein the industry identification Model is used for the corresponding relationship of characteristic feature information and category of employment.
8. according to claim 7 for generating the device of information, wherein use is further configured in the generation module In:
Obtain feature vector corresponding with the enterprise name, the keyword, the business scope information respectively, will respectively with The enterprise name, the keyword, the corresponding feature vector of the business scope information are determined as fisrt feature information.
9. according to claim 7 for generating the device of information, wherein the Feature Selection Model by training in advance The convolutional layer of convolutional neural networks and maximum pond layer composition.
10. according to claim 9 for generating the device of information, wherein the industry identification model is the convolution The full articulamentum of neural network.
11. according to claim 7 for generating the device of information, wherein the word segmentation module is further configured to:
The enterprise name and the business scope information are segmented respectively, by each word in the enterprise name and described Each word in business scope information is separately input into term vector model trained in advance, obtains each word in the enterprise name The term vector of term vector and each word in the business scope information, wherein the term vector model be used to generate the word of word to Amount.
12. according to claim 7 for generating the device of information, wherein remaining described information includes the target enterprise Industry it is at least one of following: management position, scale, sets up time, place at registration type;And
The third extraction unit includes:
Determining module is configured to determine the corresponding one-hot coding of each single item in remaining described information;
Splicing module is configured to splice the corresponding one-hot coding of items in remaining described information, it is special to generate second Reference breath.
13. a kind of server, comprising:
One or more processors;
Storage device, for storing one or more programs,
When one or more of programs are executed by one or more of processors, so that one or more of processors are real Now such as method as claimed in any one of claims 1 to 6.
14. a kind of computer readable storage medium, is stored thereon with computer program, wherein when the program is executed by processor Realize such as method as claimed in any one of claims 1 to 6.
CN201810045681.1A 2018-01-17 2018-01-17 Method and apparatus for generating information Active CN108171276B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810045681.1A CN108171276B (en) 2018-01-17 2018-01-17 Method and apparatus for generating information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810045681.1A CN108171276B (en) 2018-01-17 2018-01-17 Method and apparatus for generating information

Publications (2)

Publication Number Publication Date
CN108171276A CN108171276A (en) 2018-06-15
CN108171276B true CN108171276B (en) 2019-07-23

Family

ID=62514587

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810045681.1A Active CN108171276B (en) 2018-01-17 2018-01-17 Method and apparatus for generating information

Country Status (1)

Country Link
CN (1) CN108171276B (en)

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109388712A (en) * 2018-09-21 2019-02-26 平安科技(深圳)有限公司 A kind of trade classification method and terminal device based on machine learning
CN110941826B (en) * 2018-09-21 2022-08-09 武汉安天信息技术有限责任公司 Malicious android software detection method and device
CN109359197B (en) * 2018-10-31 2021-01-05 税友软件集团股份有限公司 Tax type authentication method, device and computer readable storage medium
CN111126422B (en) * 2018-11-01 2023-10-31 百度在线网络技术(北京)有限公司 Method, device, equipment and medium for establishing industry model and determining industry
CN111125550B (en) * 2018-11-01 2023-11-24 百度在线网络技术(北京)有限公司 Point-of-interest classification method, device, equipment and storage medium
CN111242146B (en) * 2018-11-09 2023-08-25 蔚来(安徽)控股有限公司 POI information classification based on convolutional neural network
CN109710906A (en) * 2018-12-06 2019-05-03 深圳市标准技术研究院 Business scope auxiliary makes a report on method, apparatus, terminal device and storage medium
CN109801118A (en) * 2018-12-24 2019-05-24 航天信息股份有限公司 Identify method, apparatus, medium and the equipment of the manufacturing business of designated trade
CN112307199A (en) * 2019-07-14 2021-02-02 阿里巴巴集团控股有限公司 Information identification method, data processing method, device and equipment, information interaction method
CN112487794B (en) * 2019-08-21 2023-09-22 顺丰科技有限公司 Industry classification method, device, terminal equipment and storage medium
CN110781955A (en) * 2019-10-24 2020-02-11 中国银联股份有限公司 Method and device for classifying label-free objects and detecting nested codes and computer-readable storage medium
CN111104791B (en) * 2019-11-14 2024-02-20 北京金堤科技有限公司 Industry information acquisition method and device, electronic equipment and medium
CN111538837A (en) * 2020-04-27 2020-08-14 北京同邦卓益科技有限公司 Method and device for analyzing enterprise operation range information
CN111914090B (en) * 2020-08-18 2021-05-04 生态环境部环境规划院 Method and device for enterprise industry classification identification and characteristic pollutant identification
CN112163153B (en) * 2020-09-30 2024-05-03 深圳前海微众银行股份有限公司 Industry label determining method, device, equipment and storage medium
CN112487263A (en) * 2020-11-26 2021-03-12 杭州安恒信息技术股份有限公司 Information processing method, system, equipment and computer readable storage medium
CN113869639B (en) * 2021-08-26 2023-11-07 中国环境科学研究院 Yangtze river basin enterprise screening method and device, electronic equipment and storage medium
CN114785410B (en) * 2022-04-25 2024-02-27 贵州电网有限责任公司 Accurate recognition system based on optical fiber coding

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105590102A (en) * 2015-12-30 2016-05-18 中通服公众信息产业股份有限公司 Front car face identification method based on deep learning
CN106372648A (en) * 2016-10-20 2017-02-01 中国海洋大学 Multi-feature-fusion-convolutional-neural-network-based plankton image classification method
CN106779467A (en) * 2016-12-31 2017-05-31 成都数联铭品科技有限公司 Enterprises ' industry categorizing system based on automatic information screening
CN107169036A (en) * 2017-04-19 2017-09-15 畅捷通信息技术股份有限公司 Determine the method and system of the affiliated category of employment of enterprise
CN108241867A (en) * 2016-12-26 2018-07-03 阿里巴巴集团控股有限公司 A kind of sorting technique and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9058515B1 (en) * 2012-01-12 2015-06-16 Kofax, Inc. Systems and methods for identification document processing and business workflow integration
CN102881125B (en) * 2012-09-25 2014-06-18 杭州立高科技有限公司 Alarm monitoring system based on multi-information fusion centralized processing platform

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105590102A (en) * 2015-12-30 2016-05-18 中通服公众信息产业股份有限公司 Front car face identification method based on deep learning
CN106372648A (en) * 2016-10-20 2017-02-01 中国海洋大学 Multi-feature-fusion-convolutional-neural-network-based plankton image classification method
CN108241867A (en) * 2016-12-26 2018-07-03 阿里巴巴集团控股有限公司 A kind of sorting technique and device
CN106779467A (en) * 2016-12-31 2017-05-31 成都数联铭品科技有限公司 Enterprises ' industry categorizing system based on automatic information screening
CN107169036A (en) * 2017-04-19 2017-09-15 畅捷通信息技术股份有限公司 Determine the method and system of the affiliated category of employment of enterprise

Also Published As

Publication number Publication date
CN108171276A (en) 2018-06-15

Similar Documents

Publication Publication Date Title
CN108171276B (en) Method and apparatus for generating information
CN108153901A (en) The information-pushing method and device of knowledge based collection of illustrative plates
CN107491547A (en) Searching method and device based on artificial intelligence
CN109325541A (en) Method and apparatus for training pattern
CN110135901A (en) A kind of enterprise customer draws a portrait construction method, system, medium and electronic equipment
CN107491534A (en) Information processing method and device
CN107220386A (en) Information-pushing method and device
CN109992763A (en) Language marks processing method, system, electronic equipment and computer-readable medium
CN109697641A (en) The method and apparatus for calculating commodity similarity
CN109325213A (en) Method and apparatus for labeled data
CN109635103A (en) Abstraction generating method and device
CN109189938A (en) Method and apparatus for updating knowledge mapping
CN108121699B (en) Method and apparatus for outputting information
CN108287927B (en) For obtaining the method and device of information
CN109697239A (en) Method for generating the method for graph text information and for generating image data base
CN109299477A (en) Method and apparatus for generating text header
CN107145485A (en) Method and apparatus for compressing topic model
US10375005B2 (en) Dynamic hashtag ordering based on projected interest
CN110119445A (en) The method and apparatus for generating feature vector and text classification being carried out based on feature vector
CN109933514A (en) A kind of data test method and apparatus
CN110209677A (en) The method and apparatus of more new data
CN109934242A (en) Image identification method and device
CN110347908A (en) Voice purchase method, device, medium and electronic equipment
CN107832338A (en) A kind of method and system for identifying core product word
CN110443236A (en) Text will put information extracting method and device after loan

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant