CN114297472A

CN114297472A - Intelligent industry policy recommendation method and system, electronic device and medium

Info

Publication number: CN114297472A
Application number: CN202111395940.1A
Authority: CN
Inventors: 苗璐; 王志刚; 林文辉
Original assignee: Aisino Corp
Current assignee: Aisino Corp
Priority date: 2021-11-23
Filing date: 2021-11-23
Publication date: 2022-04-08

Abstract

The application discloses an intelligent industry policy recommendation method, system, electronic equipment and medium. The method can comprise the following steps: performing industry policy feature analysis according to the policy text, and establishing an industry word bank; generating a corresponding industry label aiming at the policy text; acquiring a characteristic word list of a target enterprise, wherein the content of the characteristic word list comprises main and business commodities, enterprise names and a business range; predicting the industry of the target enterprise through a BERT _ TextCNN _ BilSTM industry model according to the feature word list; and pushing a corresponding policy text to the target enterprise through the industry lexicon and the industry label according to the industry of the target enterprise. The method and the system recommend the policy file for the enterprise according to the industry characteristic information of the enterprise operation data, provide more convenient and targeted policy recommendation for the enterprise, are beneficial to enterprise development and promote the effectiveness of policy implementation.

Description

Intelligent industry policy recommendation method and system, electronic device and medium

Technical Field

The invention relates to the technical field of big data, in particular to a method, a system, electronic equipment and a medium for recommending a policy in an intelligent industry.

Background

In the face of various policy information, the policy announcement system generally presents all policies in a reverse manner according to the release time, and provides fixed screening conditions for enterprises to inquire the policies, so that the difficulty of acquiring the policy information by the enterprises is reduced to a certain extent. However, some policies are time-limited, require application for approval, and may lose benefits if information is not obtained in a timely manner. And enterprise operation risks can be generated when the law and regulation policies are not known in time, and unnecessary loss is brought to enterprises.

At present, the demand for automatic policy recommendation is large, but due to lack of standardization in data and industry definition, research on industry policy recommendation is limited.

Therefore, there is a need to develop an intelligent industry policy recommendation method, system, electronic device and medium.

The information disclosed in this background section is only for enhancement of understanding of the general background of the invention and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art already known to a person skilled in the art.

Disclosure of Invention

The invention provides an intelligent industry policy recommendation method, system, electronic equipment and medium, which recommend policy documents for enterprises according to industry characteristic information of enterprise operation data, provide more convenient and fast and highly targeted policy recommendation for the enterprises, are beneficial to enterprise development and promote the effectiveness of policy implementation.

In a first aspect, an embodiment of the present disclosure provides an intelligent industry policy recommendation method, including:

performing industry policy feature analysis according to the policy text, and establishing an industry word bank;

generating a corresponding industry label aiming at the policy text;

acquiring a characteristic word list of a target enterprise, wherein the content of the characteristic word list comprises main and business commodities, enterprise names and a business range;

predicting the industry of the target enterprise through a BERT _ TextCNN _ BilSTM industry model according to the feature word list;

and pushing a corresponding policy text to the target enterprise through the industry lexicon and the industry label according to the industry to which the target enterprise belongs.

Preferably, the money is respectively calculated for all the commodities sold in the set time, and the main-camp commodities are the commodities which are sorted from large to small according to the money amount and are 50 percent of the commodities.

Preferably, words belonging to the industry lexicon in the historical policy text data of the target enterprise are extracted and added to the feature word list.

Preferably, generating a corresponding industry label for the policy text comprises:

extracting words belonging to an industry lexicon in the policy text, and coding the words into vectors through BERT;

similarity calculation is carried out on the vector and the text feature vector of each industry category, and if the similarity exceeds a set threshold, a corresponding industry label is generated aiming at the policy text; in a certain policy, a plurality of similarity exceeds a threshold value, and all industry class labels are recorded.

Preferably, if a plurality of similarity degrees of the policy document exceed a set threshold value, all industry tags are recorded.

Preferably, if a policy does not belong to any industry category tag, the policy is marked as an industry-wide policy tag.

Preferably, predicting the industry of the target enterprise according to the feature vocabulary through a BERT _ TextCNN _ BilSTM industry model comprises the following steps:

converting the content of the feature word list into a feature word vector through a BERT pre-training model;

predicting the industry of the target enterprise through a TextCNN network and a BilSTM network respectively according to the feature word vector;

and performing weighted average calculation on the probabilities of the industries of the TextCNN and the BilSTM, and selecting the industry with the highest probability as a prediction industry.

As a specific implementation of the embodiments of the present disclosure,

in a second aspect, an embodiment of the present disclosure further provides an intelligent industry policy recommendation system, including:

the industry word stock establishing module is used for carrying out industry policy characteristic analysis according to the policy text and establishing an industry word stock;

the industry label generating module is used for generating a corresponding industry label aiming at the policy text;

the system comprises a characteristic word list acquisition module, a characteristic word list acquisition module and a characteristic word list acquisition module, wherein the content of the characteristic word list comprises main and business commodities, enterprise names and a business range;

the prediction module predicts the industry of the target enterprise through a BERT _ TextCNN _ BilSTM industry model according to the feature vocabulary;

and the pushing module is used for pushing the corresponding policy text to the target enterprise through the industry lexicon and the industry label according to the industry to which the target enterprise belongs.

In a third aspect, an embodiment of the present disclosure further provides an electronic device, where the electronic device includes:

a memory storing executable instructions;

a processor that executes the executable instructions in the memory to implement the intelligent industry policy recommendation method.

In a fourth aspect, the disclosed embodiments also provide a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the method for recommending an intelligent industry policy is implemented.

The beneficial effects are that: the invention fully excavates the industry characteristics in enterprise operation data, national local policy documents and national industry standard text data, generates an industry word stock and labels industry labels for the policy documents; meanwhile, the traditional industry classification method is improved, an industry classification model based on BERT, Chunk-max porous textCNN and BilSTM algorithms is designed, the defect of a single model is overcome, the operation data characteristics of enterprises are better mined, an industry word bank, an industry model and policy information are periodically updated, an industry policy recommendation method based on policy industry labels and industry prediction is provided, the normalized management of the countries on enterprises in various industries is facilitated, and the development of the enterprises is promoted.

The method and system of the present invention have other features and advantages which will be apparent from or are set forth in detail in the accompanying drawings and the following detailed description, which are incorporated herein, and which together serve to explain certain principles of the invention.

Drawings

The above and other objects, features and advantages of the present invention will become more apparent by describing in more detail exemplary embodiments thereof with reference to the attached drawings, in which like reference numerals generally represent like parts.

FIG. 1 shows a flowchart of the steps of an intelligent industry policy recommendation method according to one embodiment of the present invention.

FIG. 2 illustrates a block diagram of an intelligent industry policy recommendation system, according to one embodiment of the present invention.

Description of reference numerals:

201. an industry word stock establishing module; 202. an industry label generation module; 203. a feature word list obtaining module; 204. a prediction module; 205. and a pushing module.

Detailed Description

Preferred embodiments of the present invention will be described in more detail below. While the following describes preferred embodiments of the present invention, it should be understood that the present invention may be embodied in various forms and should not be limited by the embodiments set forth herein.

To facilitate understanding of the scheme of the embodiments of the present invention and the effects thereof, four specific application examples are given below. It will be understood by those skilled in the art that this example is merely for the purpose of facilitating an understanding of the present invention and that any specific details thereof are not intended to limit the invention in any way.

Example 1

As shown in fig. 1, the intelligent industry policy recommendation method includes: step 101, performing industry policy feature analysis according to a policy text, and establishing an industry word stock; 102, generating a corresponding industry label aiming at a policy text; 103, acquiring a characteristic word list of the target enterprise, wherein the content of the characteristic word list comprises main and business commodities, enterprise names and a business range; 104, predicting the industry of the target enterprise through a BERT _ TextCNN _ BilSTM industry model according to the feature vocabulary; and 105, pushing a corresponding policy text to the target enterprise through the industry lexicon and the industry label according to the industry of the target enterprise.

In one example, the amount is calculated separately for all the commodities sold within a set time, the primary commodities being the first 50% of the commodities sorted from large to small in amount.

In one example, words belonging to an industry word bank in the historical policy text data of the target enterprise are extracted and added to the feature word list.

In one example, generating a corresponding industry label for the policy text includes:

In one example, if the policy document has a plurality of similarities that exceed a set threshold, all industry tags are recorded.

In one example, if a policy does not belong to any industry category tag, the policy is labeled as an industry-wide policy tag.

In one example, predicting the industry to which the target business belongs from the list of feature words via the BERT _ TextCNN _ BiLSTM industry model includes:

predicting the industry of the target enterprise through a TextCNN network and a BilSTM network respectively according to the feature word vectors;

Specifically, a policy text is extracted from the policy announcement system, all texts are subjected to word segmentation and word deactivation, and a part of keyword sets are selected through TF-IDF. And meanwhile, after words are segmented and words are removed from stop in the description of each industry in the 'national economy industry classification' standard, words of each industry category are coded into vectors through BERT to obtain text characteristics. And then, carrying out similarity calculation on the keyword set and the text feature vector of each industry category after BERT coding, and selecting a text with high similarity as an alternative industry word bank.

In actual policy distribution, the first industry policy, the second industry policy, and the third industry policy may be classified according to different industries and industries. The method can be divided into the industrial policies of agriculture, forestry, animal husbandry and fishery, energy transportation, manufacturing industry, commerce and foreign trade, building and installation industry, science and technology and emerging industry, cultural sports health, civil welfare and social security, financial insurance securities, administrative and judicial law, public service and the like. And screening out the industry related words in the alternative industry word bank by combining the policy classification method and expert experience, and maintaining the industry word bank.

In order to identify the industry characteristics of the policy text, a specific industry label is generated for the policy text, words belonging to an industry lexicon in the policy text are extracted, the words are coded into vectors through BERT, similarity calculation is carried out on the vectors and the text characteristic vectors of each industry category in the national economy industry classification standard, the policy with the similarity exceeding the threshold value is recorded as the industry policy label of the industry category, and the policy without being marked as the specific industry category is recorded as an all-industry applicable policy label. In a certain policy, a plurality of similarity exceeds a threshold value, and all industry class labels are recorded.

And extracting the management goods, the amount, the enterprise name, the management range and the historical policy text data of the target enterprise from the database, wherein the management goods, the amount, the enterprise name, the management range and the historical policy text data of the enterprise are all Chinese texts. And deleting repeated data and processing missing and abnormal data. The calculation amount of all commodities sold in one to two years is counted, and the top 50% of the commodities are sorted from large to small according to the money amount as main operation commodities. Performing word segmentation on main commodities, enterprise names and the operating range, maintaining a stop word list, removing stop words, punctuations and special symbols, and only reserving the first N words after word segmentation in the operating range to obtain a characteristic word list corresponding to a target enterprise. And extracting the words belonging to the industry word bank in the historical policy text data of the target enterprise, and adding the words into the characteristic word list to perfect the industry characteristic word list of the target enterprise.

Predicting the industry of the target enterprise through a BERT _ TextCNN _ BilSTM industry model according to the feature word list:

converting the content of the feature word list into a feature word vector through a BERT pre-training model; BERT is a pre-training model proposed by Google, can be used for various tasks such as text classification and machine translation in the natural field, and can also generate word vectors. The word vectors of the BERT pre-training are dynamic, and different word vectors can be generated according to different context information of words, so that the characteristic word coding of enterprises can be performed through the BERT pre-training model, and the characteristics of the enterprises can be better extracted.

the TextCNN can effectively extract the characteristics of the industry text information, and is structured by an embedding layer, a convolution layer, a pooling layer and an output layer. The enterprise feature word list after feature processing is converted into a vector through BERT coding in an embedding layer, convolution layers select convolution kernels with the width of vector dimension and the heights of 2, 3 and 4 respectively to extract context word features, convolution kernels with different heights are used in the convolution layers, vector dimensions obtained after data pass through the convolution layers are inconsistent, and for retaining significant features, a Chunk-max posing pooling method is adopted in a pooling layer, relative sequence information of a plurality of local maximum feature values is retained, and each value is spliced to obtain a final feature vector of the pooling layer. And arranging a plurality of full connection layers on the output layer, and obtaining the probability of each industry by using the softmax activation function in the last layer.

The bidirectional long-short term memory network BilSTM adopts a bidirectional LSTM network, and the time sequence information in the industry text information can be effectively extracted by considering the context information. The LSTM realizes the updating and the retention of history information by an input gate, a forgetting gate, an output gate and a cell unit; the forgetting degree of the information of the last cell is determined through the forgetting gate, the input gate determines which information is added into the cell, and the output gate determines the degree of the current information output. The words in the industry feature word list of the enterprise are coded into vectors through BERT, the vectors pass through a bidirectional LSTM network, and the probability belonging to each industry is obtained by using a softmax activation function in the last layer.

And carrying out weighted average calculation on the probabilities of the industries of the TextCNN and the Bi-LSTM, and selecting the industry with the highest probability as a prediction industry.

And pushing a corresponding policy text to the target enterprise through the industry lexicon and the industry label according to the industry of the target enterprise.

Policy recommendation for enterprises is divided into automatic recommendation and self-service query. In the automatic recommendation module, a recommendation interface is divided into two parts of prediction industry policy recommendation and industry-wide policy recommendation. The forecast industry policy recommendation refers to an industry policy which is arranged inversely according to time and automatically recommends an industry category label for an enterprise, wherein the enterprise industry category refers to an output result of an industry forecast model. In order to enable enterprises to quickly master the industry information of the policy, the industry label of the policy is marked while the policy is displayed. Industry wide policy recommendations are arranged by time-reversal to show industry wide applicable policies. And providing an industry hierarchical catalog comprising an industry door category, an industry major category, an industry middle category, an industry minor category and descriptions thereof in a self-service query module, and selecting the industry category in the catalog to obtain policy recommendations such as industry preference and the like within the validity period of the category.

In the process of determining the industry and recommending, the industry word bank needs to be updated and maintained regularly, and the industry model based on BERT _ TextCNN _ BilSTM is updated regularly along with the change of the business operation condition.

The method can standardize the expression modes of the industry categories and the industry key information, also can recommend policy files for enterprises according to the industry characteristic information of the enterprise operation data, makes clear the industry categories of the enterprises, is favorable for the national standardized management of the enterprises in various industries, provides more convenient and highly targeted policy recommendation for the enterprises, is favorable for the development of the enterprises, and promotes the effectiveness of policy implementation.

Example 2

As shown in fig. 2, the intelligent industry policy recommendation system includes:

an industry word stock establishing module 201, which performs industry policy feature analysis according to the policy text to establish an industry word stock;

an industry label generation module 202, which generates a corresponding industry label for the policy text;

the characteristic word list obtaining module 203 is used for obtaining a characteristic word list of a target enterprise, wherein the content of the characteristic word list comprises a main operation commodity, an enterprise name and an operation range;

the prediction module 204 predicts the industry of the target enterprise through a BERT _ TextCNN _ BilSTM industry model according to the feature vocabulary;

the pushing module 205 pushes the corresponding policy text to the target enterprise through the industry lexicon and the industry label according to the industry to which the target enterprise belongs.

The system can standardize the expression modes of the industry categories and the industry key information, can recommend policy files for enterprises according to the industry characteristic information of the enterprise operation data, makes clear the industry categories of the enterprises, is favorable for the national standardized management of the enterprises in various industries, provides more convenient and more targeted policy recommendation for the enterprises, is favorable for the development of the enterprises, and promotes the effectiveness of policy implementation.

Example 3

The present disclosure provides an electronic device including: a memory storing executable instructions; and the processor executes the executable instructions in the memory to realize the intelligent industry policy recommendation method.

An electronic device according to an embodiment of the present disclosure includes a memory and a processor.

The memory is to store non-transitory computer readable instructions. In particular, the memory may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc.

The processor may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device to perform desired functions. In one embodiment of the disclosure, the processor is configured to execute the computer readable instructions stored in the memory.

Those skilled in the art should understand that, in order to solve the technical problem of how to obtain a good user experience, the present embodiment may also include well-known structures such as a communication bus, an interface, and the like, and these well-known structures should also be included in the protection scope of the present disclosure.

For the detailed description of the present embodiment, reference may be made to the corresponding descriptions in the foregoing embodiments, which are not repeated herein.

Example 4

The disclosed embodiments provide a computer-readable storage medium storing a computer program which, when executed by a processor, implements the intelligent industry policy recommendation method.

A computer-readable storage medium according to an embodiment of the present disclosure has non-transitory computer-readable instructions stored thereon. The non-transitory computer readable instructions, when executed by a processor, perform all or a portion of the steps of the methods of the embodiments of the disclosure previously described.

The computer-readable storage media include, but are not limited to: optical storage media (e.g., CD-ROMs and DVDs), magneto-optical storage media (e.g., MOs), magnetic storage media (e.g., magnetic tapes or removable disks), media with built-in rewritable non-volatile memory (e.g., memory cards), and media with built-in ROMs (e.g., ROM cartridges).

It will be appreciated by persons skilled in the art that the above description of embodiments of the invention is intended only to illustrate the benefits of embodiments of the invention and is not intended to limit embodiments of the invention to any examples given.

Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments.

Claims

1. An intelligent industry policy recommendation method, comprising:

generating a corresponding industry label aiming at the policy text;

2. The intelligent industry policy recommendation method according to claim 1, wherein the amount of money is calculated separately for all commodities sold within a set time, and the top 50% of the commodities in the main camp are sorted from large to small according to the amount of money.

3. The intelligent industry policy recommendation method according to claim 1, wherein words belonging to the industry lexicon in the historical policy text data of the target enterprise are extracted and added to the feature vocabulary.

4. The intelligent industry policy recommendation method of claim 1 wherein generating a corresponding industry label for a policy text comprises:

5. The intelligent industry policy recommendation method of claim 4 wherein if the policy document has a plurality of similarities exceeding a set threshold, recording all industry tags.

6. The intelligent industry policy recommendation method of claim 4 wherein a policy is labeled an industry-wide policy label if the policy does not belong to any industry category label.

7. The intelligent industry policy recommendation method of claim 1 wherein predicting the industry of the target business from the feature vocabulary via the BERT _ TextCNN _ BiLSTM industry model comprises:

8. An intelligent industry policy recommendation system, comprising:

9. An electronic device, characterized in that the electronic device comprises:

a memory storing executable instructions;

a processor that executes the executable instructions in the memory to implement the intelligent industry policy recommendation method of any one of claims 1-7.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, implements the intelligent industry policy recommendation method of any one of claims 1-7.