CN112835910B

CN112835910B - Method and device for processing enterprise information and policy information

Info

Publication number: CN112835910B
Application number: CN202110246720.6A
Authority: CN
Inventors: 李全祚
Original assignee: Tianjiu Sharing Network Technology Group Co ltd
Current assignee: Tianjiu Sharing Network Technology Group Co ltd
Priority date: 2021-03-05
Filing date: 2021-03-05
Publication date: 2023-10-17
Anticipated expiration: 2041-03-05
Also published as: CN112835910A

Abstract

The invention relates to a method and a device for processing enterprise information and policy information, wherein the method comprises the following steps: acquiring first enterprise and policy information; carrying out enterprise condition identification processing; if the identification is successful, the policy theme classification and the policy area identification are carried out; performing feature recognition processing on the first enterprise and the policy information; performing policy classification processing by using a policy classification model to generate a first classification label set; counting the number of enterprise sample data to generate a first number; if the first quantity exceeds the threshold value, calculating matching degree data by using a linear regression model corresponding to the theme; if the first quantity does not exceed the threshold value and the first classification labels are not all empty, calculating matching degree data by using a linear regression model corresponding to the labels; if the first number does not exceed the threshold and the first classification tag is all empty, matching degree data is calculated using a linear regression model corresponding to the region. The matching degree calculation flexibility can be improved, and the matching accuracy and the operation efficiency can be improved.

Description

Method and device for processing enterprise information and policy information

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a method and an apparatus for processing enterprise information and policy information.

Background

When the enterprise declares the project, the enterprise pays attention to the information of the conditions, subjects, fields, categories, issuing departments, areas, subsidies and the like of the related policies, and selects the policy project meeting the requirements and having higher matching degree from the information to declare. In daily processing, the policy information acquisition and screening work is mostly completed manually, and is influenced by manual quality factors, so that the phenomenon of information omission often occurs, and the situation that an enterprise misses the optimal project reporting is caused.

Disclosure of Invention

The invention aims at overcoming the defects of the prior art and provides a processing method, a device, electronic equipment and a computer readable storage medium for enterprise information and policy information, wherein the policy information is screened by using a policy classification model obtained through training, and the matching degree of the enterprise and the policy is estimated by using a linear regression model obtained through training, so that the manual participation link can be reduced, the matching accuracy and the operation efficiency can be improved, and the accuracy of a model obtained through training can be continuously improved along with the increase of the data quantity of model training.

To achieve the above object, a first aspect of an embodiment of the present invention provides a method for processing enterprise information and policy information, the method including:

acquiring first enterprise information and first policy information;

according to the first policy information and the first enterprise information, carrying out enterprise condition identification processing;

if the enterprise condition identification processing is successful, performing policy theme division and policy region identification processing according to the first policy information to generate first theme data and first region data;

and carrying out enterprise feature recognition processing on the first enterprise information to generate a first feature data set; performing policy feature recognition processing on the first policy information to generate a second feature data set;

using a policy classification model to perform policy classification processing on the first policy information to generate a first classification label set; the first classification label set at least comprises a first industry field label, a first policy class label and a first issuing department label;

then, counting the number of first collected enterprise sample data corresponding to the first theme data in a preset subsidy enterprise sample database to generate a first number; the subsidized enterprise sample database includes a plurality of the first collected enterprise sample data;

If the first number exceeds a preset first threshold value, performing matching degree calculation processing of enterprise information and policy information by using a linear regression model corresponding to a theme according to the first characteristic data set, the second characteristic data set and the first theme data, and generating first matching degree data;

if the first number does not exceed the first threshold value and the first industry field label, the first policy class label and the first issuing department label are not all empty, performing matching degree calculation processing of enterprise information and policy information by using a linear regression model corresponding to the label according to the first feature data set, the second feature data set and the label which is not empty, and generating second matching degree data;

and if the first number does not exceed the first threshold value and the first industry field label, the first policy class label and the first issuing department label are all empty, performing matching degree calculation processing on enterprise information and policy information by using a linear regression model corresponding to a region according to the first feature data set, the second feature data set and the first region data, and generating third matching degree data.

Preferably, before the acquiring the first enterprise information and the first policy information, the method further includes:

acquiring the submitted policy information from a target website to generate first acquired policy information, and acquiring submitted enterprise information corresponding to the first acquired policy information and policy subsidy information of the enterprise to generate first acquired enterprise information and first acquired subsidy information;

performing policy theme division processing on the first acquired policy information to generate first acquired policy theme data;

performing policy region identification processing on the first acquired policy information to generate first acquired policy region data;

performing classification label marking processing on the first acquisition policy information to generate a plurality of first acquisition policy classification labels; the plurality of first collection policy classification labels include at least the first industry domain label, the first policy classification label, and the first distribution department label;

performing policy feature recognition processing on the first acquired policy information to generate a plurality of first acquired policy feature data;

forming first acquisition policy sample data from the first acquisition policy information, the first acquisition policy topic data, the first acquisition policy region data, the plurality of first acquisition policy classification tags and the plurality of first acquisition policy feature data;

Storing the first acquired policy sample data into a preset policy sample database;

carrying out enterprise feature recognition processing on the first acquired enterprise information to generate a plurality of first acquired enterprise feature data;

performing numerical conversion processing on the first acquired subsidy information to generate first acquired subsidy amount data;

forming the first collected enterprise sample data from the plurality of first collected enterprise feature data and the first collected subsidy amount data;

storing the first collected enterprise sample data into the subsidy enterprise sample database;

and establishing a data association relationship between the first acquired enterprise sample data in the subsidy enterprise sample database and the first acquired policy feature data corresponding to the policy sample database.

Preferably, the method further comprises:

extracting the first collected policy sample data from the policy sample database as training data for the policy classification model prior to use of the policy classification model; extracting the first acquired policy feature data from the first acquired policy sample data as model input data, extracting the first acquired policy classification labels as model output data, and training the policy classification model according to a density clustering method;

The linear regression model corresponding to the subject is thatWherein the value of i ranges from 0 to n, n is an integer, and x is _i Inputting data for the model, said F ₁ Outputting data for the model, said a _i The first model weight parameter is; using the linear regression model corresponding to the subjectBefore, obtaining the first collected policy sample data corresponding to the theme and the first collected enterprise sample data corresponding to the subsidy enterprise sample database from the policy sample database, and calculating first enterprise matching degree data corresponding to the first collected enterprise sample data; and extracting specified characteristic data from the obtained first acquired policy sample data and the first acquired enterprise sample data respectively to form a first training characteristic data sequence (x ₀ ,x ₁ ,x ₂ …,x _i …x _n ) The method comprises the steps of carrying out a first treatment on the surface of the Then the first training characteristic data sequence is used as model input data x _i Taking the corresponding first enterprise matching degree data as model output data F ₁ Performing model training on the linear regression model corresponding to the subject according to a gradient descent method to obtain the first model weight parameter a _i ；

The linear regression model corresponding to the label is that Wherein the value of j ranges from 0 to m, m is an integer, and y _j Inputting data for the model, said F ₂ Outputting data for the model, said b _j Weighting parameters for the second model; before the linear regression model corresponding to the label is used, obtaining the first collected policy sample data corresponding to the label and the first collected enterprise sample data corresponding to the subsidized enterprise sample database from the policy sample database, and calculating first enterprise matching degree data corresponding to the first collected enterprise sample data; and extracting designated characteristic data from the obtained first acquired policy sample data and the first acquired enterprise sample data respectively to form a second training characteristic data sequence (y ₀ ,y ₁ ,y ₂ …y _j …y _m ) The method comprises the steps of carrying out a first treatment on the surface of the And then taking the second training characteristic data sequence as model input data y _j Taking the corresponding first enterprise matching degree data as a model output number F ₂ Performing gradient descent on the linear regression model corresponding to the labelTraining a row model to obtain the weight parameter b of the second model _j ；

The linear regression model corresponding to the region is thatWherein the value of h ranges from 0 to k, k is an integer, and z is _h Inputting data for the model, said F ₃ Outputting data for the model, said c _h The third model weight parameter; before the linear regression model corresponding to the region is used, obtaining the first acquired policy sample data corresponding to the region and the first acquired enterprise sample data corresponding to the subsidized enterprise sample database from the policy sample database, and calculating first enterprise matching degree data corresponding to the first acquired enterprise sample data; and extracting specified characteristic data from the obtained first acquired policy sample data and the first acquired enterprise sample data respectively to form a third training characteristic data sequence (z ₀ ,z ₁ ,z ₂ …z _h …z _k ) The method comprises the steps of carrying out a first treatment on the surface of the And then taking the third training characteristic data sequence as model input data z _h Taking the corresponding first enterprise matching degree data as model output data F ₃ Performing model training on the linear regression model corresponding to the region according to a gradient descent method to obtain the third model weight parameter c _h ；

The calculating the first enterprise matching degree data corresponding to the first collected enterprise sample data specifically includes:

calculating and generating the first enterprise matching degree data according to the preset minimum and maximum value of the subsidy amount, the preset minimum and maximum value of the enterprise matching degree and the first acquired subsidy amount data of the first acquired enterprise sample data, wherein the first enterprise matching degree data = minimum value of the enterprise matching degree + (maximum value of the enterprise matching degree-minimum value of the enterprise matching degree) x normalized data,

Preferably, the performing the identifying process of the enterprise condition according to the first policy information and the first enterprise information specifically includes:

in the first policy information, identifying and processing related content defining enterprise conditions to generate a plurality of first information data sets; the first information data group comprises first information name data and first information content data;

in the first enterprise information, identifying and processing enterprise information content corresponding to each piece of first information name data to generate a corresponding second information data set; the second information data group comprises second information name data and second information content data;

and if each piece of second information content data can be matched with the corresponding piece of first information content data, the enterprise condition identification processing is successful.

Preferably, the calculating the matching degree of the enterprise information and the policy information according to the first feature data set, the second feature data set and the first subject data by using a linear regression model corresponding to the subject, and generating the first matching degree data specifically includes:

respectively extracting specified data from the first characteristic data set and the second characteristic data set to form a first data sequence; selecting a linear regression model corresponding to the first subject data as a first calculation model;

Taking the first data sequence as model input data of the first calculation model, and taking the first model weight parameters obtained by training in advance into the first calculation model to calculate so as to generate first temporary matching degree data;

calculating and generating the first matching degree data according to the first temporary matching degree data and the enterprise matching degree maximum value, wherein the first matching degree data=max (0, min (first temporary matching degree data, enterprise matching degree maximum value)); the min () is a function taking the minimum value, and the max () is a function taking the maximum value.

Preferably, the calculating the matching degree of the enterprise information and the policy information according to the first feature data set, the second feature data set and the tag that is not empty, using a linear regression model corresponding to the tag, and generating the second matching degree data specifically includes:

if the first industry field label is not empty, extracting specified data from the first and second characteristic data sets to form a second data sequence; selecting a linear regression model corresponding to the first industrial field label as a second calculation model; taking the second data sequence as model input data of the second calculation model, and taking the weight parameters of the second model obtained by training in advance into the second calculation model to calculate so as to generate second matching degree data;

If the first policy class label is not empty, extracting specified data from the first and second feature data sets to form a second data sequence; selecting a linear regression model corresponding to the first policy class label as a second calculation model; taking the second data sequence as model input data of the second calculation model, and taking the weight parameters of the second model obtained by training in advance into the second calculation model to calculate so as to generate second matching degree data;

if the first issuing department label is not empty, extracting specified data from the first characteristic data set and the second characteristic data set to form a second three-data sequence; selecting a linear regression model corresponding to the first issuing department label as a second third calculation model; taking the second three data sequence as model input data of the second third calculation model, and taking the weight parameters of the second third model obtained by pre-training into the second third calculation model to calculate so as to generate second third matching degree data;

then according to the second first matching degree data, the second matching degree data and the second third matching degree data Calculating to generate second temporary matching degree data, wherein the second temporary matching degree data=W ₁ X second first matching degree data+w ₂ X second matching degree data+w ₃ X second third matching degree data; the W is ₁ 、W ₂ And W is ₃ For weighting weight parameters, W ₁ +W ₂ +W ₃ ＝1；

And calculating and generating the second matching degree data according to the second temporary matching degree data and the enterprise matching degree maximum value, wherein the second matching degree data=max (0, min (the second temporary matching degree data, the enterprise matching degree maximum value)).

Preferably, the generating the third matching degree data according to the first feature data set, the second feature data set, and the first region data, using a linear regression model corresponding to a region, performs matching degree calculation processing of enterprise information and policy information, and specifically includes:

respectively extracting specified data from the first characteristic data set and the second characteristic data set to form a third data sequence; selecting a linear regression model corresponding to the first region data as a third calculation model;

taking the third data sequence as model input data of the third calculation model, and taking the weight parameters of the third model obtained by training in advance into the third calculation model to calculate, so as to generate third temporary matching degree data;

And calculating and generating third matching degree data according to the third temporary matching degree data and the enterprise matching degree maximum value, wherein the third matching degree data=max (0, min (third temporary matching degree data, enterprise matching degree maximum value)).

A second aspect of an embodiment of the present invention provides a processing apparatus for enterprise information and policy information, including:

the acquisition module is used for acquiring first enterprise information and first policy information;

the condition recognition module is used for carrying out enterprise condition recognition processing according to the first policy information and the first enterprise information;

the data processing module is used for carrying out policy theme division and policy region identification processing according to the first policy information when the enterprise condition identification processing is successful, so as to generate first theme data and first region data; and carrying out enterprise feature recognition processing on the first enterprise information to generate a first feature data set; performing policy feature recognition processing on the first policy information to generate a second feature data set;

the policy classification module is used for performing policy classification processing on the first policy information by using a policy classification model to generate a first classification label set; the first classification label set at least comprises a first industry field label, a first policy class label and a first issuing department label;

The matching degree calculation module is used for counting the number of first acquired enterprise sample data corresponding to the first theme data in a preset subsidy enterprise sample database to generate a first number; the subsidized enterprise sample database includes a plurality of the first collected enterprise sample data;

the matching degree calculating module is further configured to perform matching degree calculating processing on enterprise information and policy information by using a linear regression model corresponding to a topic according to the first feature data set, the second feature data set and the first topic data when the first number exceeds a preset first threshold value, so as to generate first matching degree data;

the matching degree calculating module is further configured to perform matching degree calculating processing on enterprise information and policy information by using a linear regression model corresponding to a tag according to the first feature data set, the second feature data set and the tag that is not empty when the first number does not exceed the first threshold and the first industry field tag, the first policy class tag and the first issuing department tag are not all empty, and generate second matching degree data;

and the matching degree calculating module is further configured to perform matching degree calculating processing on enterprise information and policy information by using a linear regression model corresponding to a region according to the first feature data set, the second feature data set and the first region data when the first number does not exceed the first threshold and the first industry field label, the first policy class label and the first distribution department label are all empty, so as to generate third matching degree data.

A third aspect of an embodiment of the present invention provides an electronic device, including: memory, processor, and transceiver;

the processor is configured to couple to the memory, and read and execute the instructions in the memory, so as to implement the method steps described in the first aspect;

the transceiver is coupled to the processor and is controlled by the processor to transmit and receive messages.

A fourth aspect of the embodiments of the present invention provides a computer-readable storage medium storing computer instructions that, when executed by a computer, cause the computer to perform the method of the first aspect.

The embodiment of the invention provides a processing method, a processing device, electronic equipment and a computer readable storage medium for enterprise information and policy information, which are characterized in that policy information is screened by using a policy classification model obtained through training, the matching degree of enterprises and policies is estimated by using a linear regression model obtained through training, the manual participation links are reduced, the matching accuracy and the operation efficiency are improved, and the accuracy of the model obtained through training can be continuously improved along with the increase of the data volume of model training.

Drawings

FIG. 1 is a schematic diagram of a method for processing enterprise information and policy information according to an embodiment of the present invention;

FIG. 2 is a block diagram of a device for processing enterprise information and policy information according to a second embodiment of the present invention;

fig. 3 is a schematic structural diagram of an electronic device according to a third embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Fig. 1 is a schematic diagram of a method for processing enterprise information and policy information according to a first embodiment of the present invention, where, as shown in fig. 1, the method mainly includes the following steps:

step 1, acquiring first enterprise information and first policy information.

Here, the first enterprise information is enterprise information of a public declaration item acquired from a target website, and may also be enterprise information for the declaration item acquired from an enterprise side, where the first enterprise information includes, but is not limited to, contents such as business operation data, production, research and development, manufacturing, management data, and the like of an enterprise; the first policy information is public project declaration policy information acquired from the target website, and may also be project declaration policy information acquired from a policy issuer, where the first policy information includes, but is not limited to, conditions, subjects, fields, categories, issuing departments, time, regions, subsidy amounts, and the like related to the declaration project.

Step 2, carrying out enterprise condition identification processing according to the first policy information and the first enterprise information;

here, the conditions of the reporting enterprise are generally defined in the policy, and the method and the device can respectively extract keywords from the first policy information and the first enterprise information as information name data, extract contents corresponding to the keywords as information content data by performing operations such as missing value processing, numerical conversion, keyword extraction, dictionary query and the like on the text information, perform part-of-speech judgment and text recognition according to the connection words between the policy keywords and the contents corresponding to the keywords to obtain judgment logic of condition matching, and perform matching on the information content data extracted from the first policy information and the first enterprise information by using the judgment logic of condition matching; if the comparison result of all the information content data meets the requirement, the enterprise condition identification processing is successful, otherwise, the enterprise condition identification processing fails; if the enterprise condition identification processing is successful, the subsequent step can be continuously executed, and if the enterprise condition identification processing is failed, the embodiment of the invention can terminate the subsequent step and return the prompt information which is similar to the enterprise information and does not meet the policy requirement as a matching result to process the result;

The method specifically comprises the following steps: step 21, in the first policy information, performing identification processing on the related content defining the enterprise condition to generate a plurality of first information data sets; the first information data group includes first information name data and first information content data;

here, the above-mentioned related contents related to the defined conditions of the enterprise include, but are not limited to, business time limit, registered capital, business fund status, business scale, business status, business field, business qualification, business credit, innovation result, etc.;

for example, if the first policy information explicitly requires "the business operation years exceed three years" and "the business registration capital exceeds 5000 ten thousand", 2 first information data sets are obtained after the identification processing of the content related to the defined business condition: a 1 st first information data group (1 st first information name data is a business hour limit, 1 st first information content data is 3 years), a 2 nd first information data group (2 nd first information name data is a registered capital, 2 nd first information content data is 5000 ten thousand);

step 22, in the first enterprise information, identifying and processing the enterprise information content corresponding to each piece of first information name data to generate a corresponding second information data set; the second information data group includes second information name data and second information content data;

For example, when the current year is 2020, given that "the year of business registration is 2010" and "the capital of business registration is 6000 ten thousand" in the first business information, after the identification processing is performed on the business information content corresponding to each first information name data in the first business information, 2 corresponding second information data sets are obtained: a 1 st second information data group (the 1 st second information name data and the 1 st first information name data are the same as each other and are the operation time limit, the 1 st second information content data are 2020-2010=10 years), a 2 nd second information data group (the 2 nd second information name data and the 2 nd first information name data are the same as each other and are the registered capital, and the 2 nd second information content data are 6000 ten thousand);

step 23, if each second information content data can be matched with the corresponding first information content data, the enterprise condition identification process is successful.

For example, in the first policy information, "business operation years exceed three years", "business registration capital exceeds 5000 ten thousand", in which the connective "exceeds" is recognized as being greater than the relationship, so here the judgment logic for condition matching is to require that the business correspondence data should be greater than the policy data; and because the 1 st second information content data is 10 years, is greater than the 1 st first information content data corresponding to 3 years, and the 2 nd second information content data is 6000 ten thousand, is greater than the 2 nd first information content data corresponding to 5000 ten thousand, the finally obtained recognition result is that the enterprise meets the policy condition, namely that the enterprise condition recognition processing is successful.

And step 3, if the enterprise condition identification processing is successful, performing policy theme division and policy region identification processing according to the first policy information to generate first theme data and first region data.

Here, when performing the policy topic division processing, in the embodiment of the present invention, operations such as missing value processing, numerical conversion, keyword extraction, dictionary querying, etc. are performed on text information, so that content related to a policy topic is extracted from first policy information, and first topic data is set according to preset topic division logic, where the first topic data may be topic text information or identification data after number classification;

for example, the title of the first policy information is "annual new enterprise subsidy reporting item", and if the title division logic is a text division logic, the first subject data is "annual new enterprise subsidy reporting item"; if the title division logic is digital classification identification logic, inquiring a preset corresponding relation table reflecting the corresponding relation between title text information and title classification identification, and obtaining a title classification identification 1 corresponding to a 'new enterprise subsidy reporting item of the annual height', wherein the first theme data is 1;

Here, when the policy region identification processing is performed, in the embodiment of the present invention, operations such as missing value processing, numerical conversion, keyword extraction, dictionary querying, etc. are performed on text information, content related to the policy region is extracted from first policy information, and first region data is set according to a preset region identification logic, where the first region data may be region text information or identification data after being categorized by numerals;

for example, the content of the first policy information includes "new enterprise subsidy reporting item in the annual high-rise in Beijing area", and if the area recognition logic is character recognition logic, the first area data is "Beijing"; if the region identification logic is the digital classification identification logic, a preset corresponding relation table reflecting the corresponding relation between the region text information and the region classification identification is queried, the region classification identification corresponding to Beijing is obtained as 2, and the first region data is 2.

Step 4, carrying out enterprise feature recognition processing on the first enterprise information to generate a first feature data set; and carrying out policy feature recognition processing on the first policy information to generate a second feature data set.

When the enterprise feature recognition processing is performed, a series of feature data are extracted from the first enterprise information to form a first feature data set mainly according to a preset enterprise feature data recognition rule by performing operations such as missing value processing, numerical conversion, keyword and content extraction corresponding to the keyword on the text information;

For example, the first feature data set may include: business basic characteristic data (such as registered addresses, registered funds, established dates, etc.), business manpower characteristic data (such as the number of corporate people, etc.), business financial characteristic data (such as whether there is a loan, last-year business income, last-year net profit, last-year tax total amount, last-year development expense, last-year net asset, last-year liability total amount, etc.), business intellectual property characteristic data (such as the number of intellectual property, etc.), business operational characteristic data (such as operating conditions, administrative permissions, last-year tax ratings, qualification certificates, etc.), business risk characteristic data (such as operating anomalies, administrative penalties, serious violations, environmental protection penalties, legal litigation, trustworthiness records, etc.), innovation carrier characteristic data (such as laboratories, research centers, etc.), scientific rewards, patent rewards, technological advances, technological rewards, key projects, etc.;

here, when the policy feature recognition processing is performed, a series of feature data is extracted from the first policy information to form a first feature data set by performing operations such as missing value processing, numerical conversion, keyword and content extraction corresponding to the keyword on the text information mainly according to a predetermined policy feature data recognition rule;

For example, the second set of feature data may include: basic feature data of policies (such as acceptance department, release date, start date, expiration date, minister's type, industry field, highest subsidy amount, etc.), feature data of policies on basic requirements of enterprises, feature data of policies on manual requirements of enterprises, feature data of policies on financial requirements of enterprises, feature data of policies on intellectual property requirements of enterprises, feature data of policies on business operation requirements, feature data of policies on risk requirements of enterprises, feature data of policies on innovative carrier requirements of enterprises, feature data of policies on technological rewards requirements of enterprises, etc.

Step 5, using a policy classification model to perform policy classification processing on the first policy information to generate a first classification label set;

the first classification label set at least comprises a first industry field label, a first policy class label and a first issuing department label.

When the policy classification model is a classification model trained by big data, in the embodiment of the invention, firstly, operations such as missing value processing, numerical conversion, extraction of keywords and contents corresponding to the keywords are performed on text information, a series of characteristic data are extracted from first policy information to form a characteristic data sequence, and the characteristic data sequence is input into the policy classification model for operation, so that an output tag sequence is obtained and is used as a first classification tag set;

The label sequence is pre-fixed with a plurality of label data, and each label data corresponds to an actual label type, such as an industry field label, a policy class label, a release department label and the like; wherein, each tag type can also be refined by adjusting model parameters, for example, industry domain tags can also be refined into a plurality of specific industry domains: the policy class labels may also be refined into a plurality of specific policy classes, such as in industry areas that are required to be released according to national high-tech, high-definition, etc. standards: such as intellectual property categories, innovative carrier categories, qualification categories, talent rewards categories, tax deduction categories, anti-epidemic subsidy categories, etc., the issuing department tags may also be refined to a plurality of specific issuing departments: such as a scientific innovation bureau, an intellectual property bureau, etc.;

after the policy classification model finishes budget, each preset tag data is assigned in the tag sequence, the assignment operation is processed according to a preset calculation mode of the policy classification model, if the set calculation mode is valid/invalid tag marking logic, the policy classification model effectively marks the tag data corresponding to the identified tag type in the tag sequence, otherwise, the tag data is invalid; if the set calculation mode is a classification scoring logic, the policy classification model scores the identification result of the label type corresponding to each label data in the label sequence, and writes the score into the label data; if the set calculation mode is a classification probability logic, the policy classification model performs probability calculation on the identification result of the tag type corresponding to each tag data in the tag sequence, and writes the probability value into the tag data;

When the obtained tag sequence is used as a first classification tag set, if the calculation mode set by the policy classification model is valid/invalid tag marking logic, the classification tag marked as invalid in the calculation mode is set as null; if the set calculation mode is classification scoring logic, setting the classification label with the score lower than the effective label scoring threshold value as null; if the set calculation mode is a classification probability logic, the classification label with probability lower than the valid label probability threshold can be set to be empty.

After the first classification label set is obtained, the embodiment of the invention selects the corresponding linear regression model according to the enterprise characteristic data, namely the first characteristic data set, and the policy characteristic data, namely the second characteristic data set through the subsequent steps, and performs matching degree calculation processing on enterprise information and policy information according to the actual condition of the policy, so as to obtain corresponding matching degree data as a matching result, and returns a processing result;

for the actual situations of the policies, in particular, three situations are included in the embodiment of the present invention: in case 1, for the subject of the current policy, the information of the calendar subsidy enterprise corresponding to the subject subsidy policy is collected in advance locally, and the creation and training of the linear regression model corresponding to the subject are completed; case 2, for the current policy, the information of the year-old subsidizing enterprise corresponding to the same industry field label of the policy and/or the same subsidizing policy class label and/or the same issuing department label of the same policy is collected in advance locally, and the creation and training of the linear regression model corresponding to the same label are completed; case 3, for the current policy, the historical subsidy enterprise information corresponding to other subsidy policies of the same area of the policy is collected in advance locally, and the creation and training of the linear regression model corresponding to the same area are completed;

The three cases are different in the calculation results of the linear regression model, in principle, the reliability of the matching degree calculated by using the linear regression model corresponding to the subject is the highest, and the reliability of the matching degree calculated by using the linear regression model corresponding to the region is the lowest.

If the historical subsidy enterprise information corresponding to the policy theme is collected in advance locally, performing matching degree calculation by using a linear regression model corresponding to the theme;

if the historical year subsidy enterprise information corresponding to the policy theme cannot be collected in advance locally, but the historical year subsidy enterprise information corresponding to the same industry field label, the same subsidy policy category label and the same issuing department label of the same policy are collected in advance, performing matching degree calculation by using a linear regression model corresponding to the label;

if the past year subsidy enterprise information corresponding to the policy subject is not collected in advance locally, but the past year subsidy enterprise information corresponding to the same kind of policies of the same industry field label, the same subsidy policy category label and the same issuing department label of the same policy is not collected in advance, but the past year subsidy enterprise information corresponding to other subsidy policies of the same area of the policy is collected in advance, the matching degree calculation is performed by using a linear regression model corresponding to the area.

Step 6, counting the number of first collected enterprise sample data corresponding to the first theme data in a preset subsidy enterprise sample database to generate a first number;

wherein the subsidized enterprise sample database includes a plurality of first collected enterprise sample data.

The subsidy enterprise sample database is a preset database, wherein a plurality of first collected enterprise sample data are stored, each first collected enterprise sample data is sample data obtained after feature extraction and numerical conversion of enterprise data collected from a target website or a target information source, and is used for collecting enterprise information of the annual relevant policy subsidy, and providing training data for model training for a linear regression model for calculating matching degree of the enterprise information and the policy information.

Here, the first number will be used by the subsequent step to determine whether the aforementioned local has previously collected the calendar patch business information corresponding to the policy topic, and if the first number exceeds the predetermined value, it is indicated that the local has previously collected the calendar patch business information corresponding to the policy topic.

Step 7, if the first number exceeds a preset first threshold, performing matching degree calculation processing of enterprise information and policy information by using a linear regression model corresponding to the theme according to the first characteristic data set, the second characteristic data set and the first theme data, and generating first matching degree data;

Here, the first threshold is a preset value, and typically, the first threshold is set to a non-0 or empty large number, so as to indicate that the historical patch enterprise information corresponding to the policy theme has been collected in advance locally, and the number is enough to complete the model training of the linear regression model corresponding to the theme, and the first number exceeding the first threshold indicates that the linear regression model is available, and according to the foregoing, the embodiment of the invention preferably uses the linear regression model corresponding to the theme to perform matching degree calculation; the first threshold value can be 0 or null in special cases, which only illustrates that the calendar subsidy enterprise information corresponding to the policy theme is collected in advance locally;

the method specifically comprises the following steps: step 71, extracting specified data from the first and second feature data sets respectively to form a first data sequence; selecting a linear regression model corresponding to the first subject data as a first calculation model;

here, the linear regression model corresponding to the subject for calculation is specifically a linear regression model corresponding to the first subject data, that is, a first calculation model; the first data sequence is a data sequence composed of characteristic data extracted from the first and second characteristic data sets according to the input data requirement of the first calculation model;

Step 72, taking the first data sequence as model input data of a first calculation model, and taking the first model weight parameters obtained by training in advance into the first calculation model to calculate, so as to generate first temporary matching degree data;

here, the first calculation model is specificallyWherein, the value range of iFrom 0 to n, n being an integer, x _i Input data for model, F ₁ Outputting data for the model, a _i The first model weight parameters are obtained through pre-training; decomposing the characteristic data of the first data sequence into x _i Bringing the first calculation model into contact with the first model weight parameters a _i Multiplying and accumulating all the products to obtain the accumulated sum F ₁ Namely, the calculation result of the first calculation model, namely, first temporary matching degree data;

step 73, calculating and generating first matching degree data according to the first temporary matching degree data and the maximum value of the enterprise matching degree, wherein the first matching degree data=max (0, min (the first temporary matching degree data, the maximum value of the enterprise matching degree));

the maximum value of the enterprise matching degree is a preset value, min () is a function taking the minimum value, and max () is a function taking the maximum value.

Here, the maximum value of the enterprise matching degree is a preset value, and the step is to perform abnormal data checking on the calculated first temporary matching degree data, and take the value exceeding the range of 0-maximum value of the enterprise matching degree as abnormal data: if the value of the first matching degree data is smaller than 0, the value of the first matching degree data is forcedly modified to 0, and if the value of the first matching degree data is larger than the maximum value of the enterprise matching degree, the value of the first matching degree data is forcedly modified to the maximum value of the enterprise matching degree; and if the matching degree is within the maximum value range of 0-enterprise matching degree, the first matching degree data is the first temporary matching degree data.

Step 8, if the first number does not exceed the first threshold value and the first industry field label, the first policy class label and the first issuing department label are not all empty, performing matching degree calculation processing of enterprise information and policy information by using a linear regression model corresponding to the label according to the first feature data set, the second feature data set and the label which is not empty, and generating second matching degree data;

here, the first threshold value does not exceed the first threshold value, which indicates that the historical year patch enterprise information corresponding to the policy theme cannot be collected in advance, the first industry field label, the first policy class label and the first issuing department label are not all blank, which indicates that the historical year patch enterprise information corresponding to the same industry field label of the policy and/or the same patch policy class label and/or the same kind of policy of the issuing department label is collected in advance, according to the foregoing, the embodiment of the invention can preferentially select the linear regression model corresponding to the label to perform matching degree calculation, and it is required to calculate by adopting the linear regression model corresponding to a plurality of labels in this step, and obtain the final matching degree data after weighting the obtained plurality of matching degree results;

The method specifically comprises the following steps: step 81, if the first industry field label is not empty, extracting the specified data from the first and second feature data sets to form a second data sequence; selecting a linear regression model corresponding to the first industrial field label as a second calculation model; taking the second data sequence as model input data of a second calculation model, and taking a second model weight parameter obtained by training in advance into the second calculation model to calculate so as to generate second matching degree data;

here, the linear regression model corresponding to the label for calculation is specifically a linear regression model corresponding to the label of the first industry field, that is, a second calculation model; the second data sequence is a data sequence composed of characteristic data extracted from the first and second characteristic data sets according to the input data requirement of the second calculation model;

here, the second calculation model is embodied asWherein j is ₁ Ranging from 0 to m ₁ ，m ₁ Is an integer>Input data for model, F ₂₁ Outputting data for the model->The weight parameters of the second model are obtained through pre-training; second data sequenceThe characteristic data of the column is decomposed into +. >The second calculation model is carried in, and the second calculation model is respectively matched with the weight parameters of the second calculation modelMultiplying and accumulating all the products to obtain the accumulated sum F ₂₁ Namely, the calculation result of the second calculation model, namely, second matching degree data;

step 82, if the first policy class label is not null, extracting the specified data from the first and second feature data sets to form a second data sequence; selecting a linear regression model corresponding to the first policy class label as a second calculation model; taking the second data sequence as model input data of a second calculation model, and taking a second model weight parameter obtained by training in advance into the second calculation model to calculate so as to generate second matching degree data;

here, the linear regression model corresponding to the label for calculation is specifically a linear regression model corresponding to the first policy class label, that is, a second calculation model; the second data sequence is a data sequence composed of characteristic data extracted from the first and second characteristic data sets according to the input data requirement of the second calculation model;

here, the second calculation model is specifically Wherein j is ₂ Ranging from 0 to m ₂ ，m ₂ Is an integer>Input data for model, F ₂₂ Outputting data for the model->The weight parameters of the second model are obtained through pre-training; will be the firstThe characteristic data of the two data sequences are decomposed into +.>And the second calculation model is carried in, and the second calculation model and the second model weight parameters are respectively carried outMultiplying and accumulating all the products to obtain the accumulated sum F ₂₂ Namely, the calculation result of the second calculation model, namely, second matching degree data;

step 83, if the first issuing department label is not empty, extracting specified data from the first and second feature data sets to form a second third data sequence; selecting a linear regression model corresponding to the first issuing department label as a second calculation model; taking the second third data sequence as model input data of a second third calculation model, and taking the weight parameters of the second third model obtained by training in advance into the second third calculation model to calculate, so as to generate second third matching degree data;

here, the linear regression model corresponding to the tag for calculation is specifically a linear regression model corresponding to the tag of the first development department, that is, a second third calculation model; the second three data sequence is a data sequence composed of the characteristic data extracted from the first and second characteristic data sets according to the input data requirement of the second calculation model;

Here, the second third calculation model is specificallyWherein j is ₃ Ranging from 0 to m ₃ ，m ₃ Is an integer>Input data for model, F ₂₃ Outputting data for the model->For the second model weight obtained by training in advanceParameters; decomposing the characteristic data of the second three data sequence into +.>Bringing the second calculation model into the second calculation model and respectively carrying out weight parameter matching with the second calculation modelMultiplying and accumulating all the products to obtain the accumulated sum F ₂₃ Namely, the calculation result of the second third calculation model, namely, second third matching degree data;

step 84, calculating and generating second temporary matching degree data according to the second first matching degree data, the second matching degree data and the second third matching degree data, wherein the second temporary matching degree data=w ₁ X second first matching degree data+w ₂ X second matching degree data+w ₃ X second third matching degree data; w (W) ₁ 、W ₂ And W is ₃ For weighting weight parameters, W ₁ +W ₂ +W ₃ ＝1；

The matching obtained by the plurality of labels is weighted to obtain second temporary matching degree data, so that the calculation result is converged through weighted calculation, and the reliability of the final estimation result is improved;

here, 3 weighting parameters W in the embodiment of the present invention ₁ 、W ₂ And W is ₃ The set value can be an empirical set value or a numerical value which is calculated dynamically; if dynamic calculation is adopted to generate the weighted weight parameter W ₁ 、W ₂ And W is ₃ The calculation method is as follows: the number of samples used for the linear regression model training corresponding to the three labels is counted as r respectively ₁ 、r ₂ And r ₃ ，r ₁ 、r ₂ And r ₃ Is an integer; then

Step 85, calculating and generating second matching degree data according to the second temporary matching degree data and the enterprise matching degree maximum value, wherein the second matching degree data=max (0, min (second temporary matching degree data, enterprise matching degree maximum value)).

Here, the calculated second temporary matching degree data is subjected to abnormal data checking, and a numerical value exceeding the maximum value range of the 0-enterprise matching degree is regarded as abnormal data: if the value of the second matching degree data is smaller than 0, the value of the second matching degree data is forcedly modified to 0, and if the value of the second matching degree data is larger than the maximum value of the enterprise matching degree, the value of the second matching degree data is forcedly modified to the maximum value of the enterprise matching degree; and if the matching degree is within the maximum value range of 0-enterprise matching degree, the second matching degree data is the second temporary matching degree data.

Step 9, if the first number does not exceed the first threshold value and the first industry field label, the first policy class label and the first issuing department label are all empty, performing matching degree calculation processing of enterprise information and policy information by using a linear regression model corresponding to the region according to the first feature data set, the second feature data set and the first region data, and generating third matching degree data;

Here, the first threshold does not exceed the first threshold, which indicates that the local fails to collect the calendar year subsidy enterprise information corresponding to the policy theme, and the first industry field label, the first policy class label and the first issuing department label are all empty, which indicates that the local fails to collect the calendar year subsidy enterprise information corresponding to the same industry field label, the same subsidy policy class label and the same issuing department label of the same policy, and the similar policy corresponding to the same policy;

the method specifically comprises the following steps: step 91, extracting specified data from the first and second feature data sets respectively to form a third data sequence; selecting a linear regression model corresponding to the first region data as a third calculation model;

here, the linear regression model corresponding to the region for calculation is specifically a linear regression model corresponding to the first region data, that is, a third calculation model; the third data sequence is a data sequence composed of the feature data extracted from the first and second feature data sets according to the input data requirement of the third calculation model;

Step 92, taking the third data sequence as model input data of a third calculation model, and taking the third model weight parameters obtained by training in advance into the third calculation model to calculate, so as to generate third temporary matching degree data;

here, the third calculation model is specificallyWherein, the value of h ranges from 0 to k, k is an integer, z _h Input data for model, F ₃ Outputting data for the model c _h The weight parameters of the third model are obtained through pre-training; decomposing the characteristic data of the third data sequence into z _h Bringing the third calculation model into association with the third model weight parameters c _h Multiplying and accumulating all the products to obtain the accumulated sum F ₃ Namely, the calculation result of the third calculation model, namely, third temporary matching degree data;

step 93, calculating and generating third matching degree data according to the third temporary matching degree data and the enterprise matching degree maximum value, wherein the third matching degree data=max (0, min (third temporary matching degree data, enterprise matching degree maximum value)).

Here, the calculated third temporary matching degree data is subjected to abnormal data checking, and a numerical value exceeding the maximum value range of the 0-enterprise matching degree is regarded as abnormal data: if the value of the third matching degree data is smaller than 0, the value of the third matching degree data is forcedly modified to 0, and if the value of the third matching degree data is larger than the maximum value of the enterprise matching degree, the value of the third matching degree data is forcedly modified to the maximum value of the enterprise matching degree; and if the matching degree is within the maximum value range of 0-enterprise matching degree, the third matching degree data is the third temporary matching degree data.

Finally, after the first, second or third matching degree data is obtained, the data is used as a processing result of the current enterprise information and the policy information to return the processing result.

In the above description of the method of the present invention, four types of models (policy classification model, linear regression model corresponding to the subject, linear regression model corresponding to the label, linear regression model corresponding to the region) and training of the four types of models are mentioned, and it should be noted that the method of the present invention supports not only application processing of the four types of models but also training processing of the four types of models, and the method of the present invention will be described below for training processing of the four types of models.

Before performing model training description, firstly, the acquisition and storage process of training data required for model training is described. The training sample data used in the method of the invention are mainly from two databases: a policy sample database and a subsidized enterprise sample database; the sample data collection and processing steps of the policy sample database and the subsidy enterprise sample database are as follows:

step 101, acquiring the submitted policy information from a target website to generate first acquired policy information, and acquiring submitted enterprise information corresponding to the first acquired policy information and policy subsidy information of the enterprise to generate first acquired enterprise information and first acquired subsidy information;

Step 102, processing sample data of the policy sample database,

firstly, carrying out policy theme division processing on the first acquired policy information to generate first acquired policy theme data; performing policy region identification processing on the first acquired policy information to generate first acquired policy region data; performing classification label marking processing on the first acquisition policy information to generate a plurality of first acquisition policy classification labels; the plurality of first collection policy classification labels at least comprise a first industry domain label, a first policy class label and a first issuing department label; performing policy feature recognition processing on the first acquired policy information to generate a plurality of first acquired policy feature data;

the first acquisition policy sample data is composed of first acquisition policy information, first acquisition policy theme data, first acquisition policy region data, a plurality of first acquisition policy classification labels and a plurality of first acquisition policy feature data;

step 103, processing the sample data of the patch enterprise sample database,

firstly, carrying out enterprise feature recognition processing on first acquired enterprise information to generate a plurality of first acquired enterprise feature data; performing numerical conversion processing on the first acquired subsidy information to generate first acquired subsidy amount data;

Then forming first collected enterprise sample data by a plurality of first collected enterprise characteristic data and first collected subsidy amount data;

storing the first collected enterprise sample data into a subsidy enterprise sample database;

step 104, establishing a data association relationship between the first collected enterprise sample data in the subsidized enterprise sample database and the corresponding first collected policy feature data in the policy sample database.

After the quantity of the sample data collected in the policy sample database and the patch enterprise sample database is enough to carry out model training, the method can extract the sample data from the sample data to carry out model training processing, and the four types of model training processing steps are as follows:

a first type of model: training of policy classification models

Extracting first collected policy sample data from the policy sample database as training data for the policy classification model prior to use of the policy classification model; and extracting a plurality of first acquisition policy characteristic data from the first acquisition policy sample data as model input data, extracting a plurality of first acquisition policy classification labels as model output data, and training the policy classification model according to a density clustering method.

Second model: training of linear regression models corresponding to topics

The linear regression model corresponding to the subject isWherein, the value of i ranges from 0 to n, n is an integer, and x _i Input data for model, F ₁ Outputting data for the model, a _i The first model weight parameter is; in using the corresponding themeBefore the linear regression model, obtaining first acquired policy sample data corresponding to the theme from a policy sample database, subsiding first acquired enterprise sample data corresponding to the enterprise sample database, and calculating first enterprise matching degree data corresponding to the first acquired enterprise sample data; and extracting specified characteristic data from the obtained first acquired policy sample data and the first acquired enterprise sample data respectively to form a first training characteristic data sequence (x ₀ ,x ₁ ,x ₂ …,x _i …x _n ) The method comprises the steps of carrying out a first treatment on the surface of the Then the first training characteristic data sequence is used as model input data x _i Taking corresponding first enterprise matching degree data as model output data F ₁ Performing model training on a linear regression model corresponding to a theme according to a gradient descent method to obtain a first model weight parameter a _i ；

The method for calculating the first enterprise matching degree data corresponding to the first collected enterprise sample data specifically comprises the following steps:

Calculating and generating first enterprise matching degree data according to the preset minimum and maximum value of the subsidy amount, the preset minimum and maximum value of the enterprise matching degree and the first acquired subsidy amount data of the first acquired enterprise sample data, wherein the first enterprise matching degree data = minimum value of the enterprise matching degree + (maximum value of the enterprise matching degree-minimum value of the enterprise matching degree) multiplied by normalization data,

third class model: training of linear regression models corresponding to tags

The linear regression model corresponding to the label isWherein, the value of j ranges from 0 to m, m is an integer, y _j Input data for model, F ₂ Output data for model b _j Weighting parameters for the second model; at the position ofBefore using a linear regression model corresponding to the label, obtaining first collected policy sample data corresponding to the label from a policy sample database, subsidizing first collected enterprise sample data corresponding to the enterprise sample database, and calculating first enterprise matching degree data corresponding to the first collected enterprise sample data; and extracting designated characteristic data from the obtained first acquired policy sample data and the first acquired enterprise sample data respectively to form a second training characteristic data sequence (y ₀ ,y ₁ ,y ₂ …y _j …y _m ) The method comprises the steps of carrying out a first treatment on the surface of the And then the second training characteristic data sequence is used as model input data y _j Taking corresponding first enterprise matching degree data as a model output number F ₂ Performing model training on the linear regression model corresponding to the label according to a gradient descent method to obtain a second model weight parameter b _j ；

The computing mode of the first enterprise matching degree data is consistent with the computing mode in training of the linear regression model corresponding to the theme.

Fourth model: training of linear regression models corresponding to regions

The linear regression model corresponding to the region isWherein, the value of h ranges from 0 to k, k is an integer, z _h Input data for model, F ₃ Outputting data for the model c _h The third model weight parameter; before using a linear regression model corresponding to a region, obtaining first acquired policy sample data corresponding to the region from a policy sample database, subsiding first acquired enterprise sample data corresponding to the enterprise sample database, and calculating first enterprise matching degree data corresponding to the first acquired enterprise sample data; and extracting specified characteristic data from the obtained first acquired policy sample data and the first acquired enterprise sample data respectively to form a third training characteristic data sequence (z ₀ ,z ₁ ,z ₂ …z _h …z _k ) The method comprises the steps of carrying out a first treatment on the surface of the And then takes the third training characteristic data sequence as model input data z _h To pair withThe corresponding first enterprise matching degree data is taken as model output data F ₃ Performing model training on the linear regression model corresponding to the region according to a gradient descent method to obtain a third model weight parameter c _h ；

Fig. 2 is a block diagram of an apparatus for processing enterprise information and policy information according to a second embodiment of the present invention, where the apparatus may be a terminal device or a server for implementing a method according to an embodiment of the present invention, or may be an apparatus for implementing a method according to an embodiment of the present invention, which is connected to the terminal device or the server, and the apparatus may be an apparatus or a chip system of the terminal device or the server, for example. As shown in fig. 2, the apparatus includes: an acquisition module 201, a condition identification module 202, a data processing module 203, a policy classification module 204, and a matching degree calculation module 205.

The obtaining module 201 is configured to obtain first enterprise information and first policy information.

The condition recognition module 202 is configured to perform an enterprise condition recognition process according to the first policy information and the first enterprise information.

The data processing module 203 is configured to perform policy topic segmentation and policy region identification processing according to the first policy information when the enterprise condition identification processing is successful, so as to generate first topic data and first region data; carrying out enterprise feature recognition processing on the first enterprise information to generate a first feature data set; and carrying out policy feature recognition processing on the first policy information to generate a second feature data set.

The policy classification module 204 is configured to perform policy classification processing on the first policy information by using a policy classification model, and generate a first classification label set; the first set of classification tags includes at least a first industry domain tag, a first policy class tag, and a first issuing department tag.

The matching degree calculating module 205 is configured to count, in a preset subsidy enterprise sample database, the number of first collected enterprise sample data corresponding to the first subject data, and generate a first number; the subsidized enterprise sample database includes a plurality of first collected enterprise sample data.

The matching degree calculating module 205 is further configured to perform matching degree calculating processing of enterprise information and policy information according to the first feature data set, the second feature data set, and the first topic data by using a linear regression model corresponding to the topic when the first number exceeds a preset first threshold, so as to generate first matching degree data.

The matching degree calculating module 205 is further configured to, when the first number does not exceed the first threshold, and the first industry field label, the first policy class label, and the first distribution department label are not all empty, perform matching degree calculating processing on the enterprise information and the policy information by using a linear regression model corresponding to the label according to the first feature data set, the second feature data set, and the label that is not empty, and generate second matching degree data.

The matching degree calculating module 205 is further configured to perform matching degree calculating processing of enterprise information and policy information according to the first feature data set, the second feature data set, and the first region data by using a linear regression model corresponding to the region when the first number does not exceed the first threshold and the first industry field label, the first policy class label, and the first distribution department label are all empty, and generate third matching degree data.

The processing device for enterprise information and policy information provided by the embodiment of the present invention may execute the method steps in the above method embodiment, and its implementation principle and technical effects are similar, and will not be described herein again.

It should be noted that, it should be understood that the division of the modules of the above apparatus is merely a division of a logic function, and may be fully or partially integrated into a physical entity or may be physically separated. And these modules may all be implemented in software in the form of calls by the processing element; or can be realized in hardware; the method can also be realized in a form of calling software by a processing element, and the method can be realized in a form of hardware by a part of modules. For example, the acquisition module may be a processing element that is set up separately, may be implemented in a chip of the above apparatus, or may be stored in a memory of the above apparatus in the form of program code, and may be called by a processing element of the above apparatus and execute the functions of the above determination module. The implementation of the other modules is similar. In addition, all or part of the modules can be integrated together or can be independently implemented. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in a software form.

For example, the modules above may be one or more integrated circuits configured to implement the methods above, such as: one or more specific integrated circuits (Application Specific Integrated Circuit, ASIC), or one or more digital signal processors (Digital Signal Processor, DSP), or one or more field programmable gate arrays (Field Programmable Gate Array, FPGA), etc. For another example, when a module above is implemented in the form of a processing element scheduler code, the processing element may be a general purpose processor, such as a central processing unit (Central Processing Unit, CPU) or other processor that may invoke the program code. For another example, the modules may be integrated together and implemented in the form of a System-on-a-chip (SOC).

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces, in whole or in part, the processes or functions described in accordance with embodiments of the present invention. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, from one website, computer, server, or data center via a wired (e.g., coaxial cable, fiber optic, digital subscriber line (Digital Subscriber Line, DSL)) or wireless (e.g., infrared, wireless, bluetooth, microwave, etc.) means. The computer readable storage media may be any available media that can be accessed by a computer or a data storage device such as a server, data center, or the like that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.

Fig. 3 is a schematic structural diagram of an electronic device according to a third embodiment of the present invention. The electronic device may be the aforementioned terminal device or server, or may be a terminal device or server connected to the aforementioned terminal device or server for implementing the method of the embodiment of the present invention. As shown in fig. 3, the electronic device may include: a processor 31 (e.g., CPU), a memory 32, a transceiver 33; the transceiver 33 is coupled to the processor 31, and the processor 31 controls the transceiving operation of the transceiver 33. The memory 32 may store various instructions for performing various processing functions and implementing the methods and processes provided in the above-described embodiments of the present invention. Preferably, the electronic device according to the embodiment of the present invention further includes: a power supply 34, a system bus 35, and a communication port 36. The system bus 35 is used to enable communication connections between the elements. The communication port 36 is used for connection communication between the electronic device and other peripheral devices.

The system bus referred to in fig. 3 may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, or the like. The system bus may be classified into an address bus, a data bus, a control bus, and the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus. The communication interface is used to enable communication between the database access apparatus and other devices (e.g., clients, read-write libraries, and read-only libraries). The Memory may comprise random access Memory (Random Access Memory, RAM) and may also include Non-Volatile Memory (Non-Volatile Memory), such as at least one disk Memory.

The processor may be a general-purpose processor, including a Central Processing Unit (CPU), a network processor (Network Processor, NP), etc.; but may also be a digital signal processor DSP, an application specific integrated circuit ASIC, a field programmable gate array FPGA or other programmable logic device, a discrete gate or transistor logic device, a discrete hardware component.

It should be noted that the embodiments of the present invention also provide a computer readable storage medium having instructions stored therein, which when executed on a computer, cause the computer to perform the methods and processes provided in the above embodiments.

The embodiment of the invention also provides a chip for running the instructions, which is used for executing the method and the processing procedure provided in the embodiment.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of function in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. A method for processing enterprise information and policy information, the method comprising:

acquiring first enterprise information and first policy information;

if the first number does not exceed the first threshold value and the first industry field label, the first policy class label and the first issuing department label are all empty, performing matching degree calculation processing on enterprise information and policy information by using a linear regression model corresponding to a region according to the first feature data set, the second feature data set and the first region data, and generating third matching degree data;

Wherein prior to the acquiring the first business information and the first policy information, the method further comprises:

establishing a data association relationship between the first acquired enterprise sample data in the subsidy enterprise sample database and the corresponding first acquired policy feature data in the policy sample database;

the method further comprises the steps of:

The linear regression model corresponding to the subject is thatWherein the value of i ranges from 0 to n, n is an integer, and x is _i Inputting data for the model, said F ₁ Outputting data for the model, said a _i The first model weight parameter is; before the linear regression model corresponding to the theme is used, obtaining the first acquired policy sample data corresponding to the theme and the first acquired enterprise sample data corresponding to the subsidized enterprise sample database from the policy sample database, and calculating first enterprise matching degree data corresponding to the first acquired enterprise sample data; and extracting specified characteristic data from the obtained first acquired policy sample data and the first acquired enterprise sample data respectively to form a first training characteristic data sequence (x ₀ ,x ₁ ,x ₂ …,x _i …x _n ) The method comprises the steps of carrying out a first treatment on the surface of the Then the first training characteristic data sequence is used as model input data x _i Taking the corresponding first enterprise matching degree data as model output data F ₁ Performing model training on the linear regression model corresponding to the subject according to a gradient descent method to obtain the first model weight parameter a _i ；

The linear regression model corresponding to the label is that Wherein the value of j ranges from 0 to m, m is an integer, and y _j Inputting data for the model, said F ₂ Outputting data for the model, said b _j Weighting parameters for the second model; obtaining a corresponding label from the policy sample database before using the linear regression model corresponding to the labelThe first acquired policy sample data and the corresponding first acquired enterprise sample data in the subsidized enterprise sample database, and first enterprise matching degree data corresponding to the first acquired enterprise sample data are calculated; and extracting designated characteristic data from the obtained first acquired policy sample data and the first acquired enterprise sample data respectively to form a second training characteristic data sequence (y ₀ ,y ₁ ,y ₂ …y _j …y _m ) The method comprises the steps of carrying out a first treatment on the surface of the And then taking the second training characteristic data sequence as model input data y _j Taking the corresponding first enterprise matching degree data as a model output number F ₂ Performing model training on the linear regression model corresponding to the label according to a gradient descent method to obtain the second model weight parameter b _j ；

And performing matching degree calculation processing of enterprise information and policy information by using a linear regression model corresponding to a theme according to the first characteristic data set, the second characteristic data set and the first theme data to generate first matching degree data, wherein the matching degree calculation processing specifically comprises the following steps:

calculating and generating the first matching degree data according to the first temporary matching degree data and the enterprise matching degree maximum value, wherein the first matching degree data=max (0, min (first temporary matching degree data, enterprise matching degree maximum value)); the min () is a function taking the minimum value, and the max () is a function taking the maximum value;

and performing matching degree calculation processing of enterprise information and policy information by using a linear regression model corresponding to the label according to the first characteristic data set, the second characteristic data set and the label which is not empty, and generating second matching degree data, wherein the matching degree calculation processing specifically comprises the following steps:

if the first issuing department label is not empty, extracting specified data from the first characteristic data set and the second characteristic data set to form a second three-data sequence; selecting a linear regression model corresponding to the first issuing department label as a second third calculation model; taking the second third data sequence as model input data of the second third calculation model, and taking the weight parameters of the second third model obtained by pre-training into the second third calculation model to calculate so as to generate second third matching degree data;

Calculating and generating second temporary matching degree data according to the second first matching degree data, the second matching degree data and the second third matching degree data, wherein the second temporary matching degree data=w ₁ X second first matching degree data+w ₂ X second matching degree data+w ₃ X second third matching degree data; the W is ₁ 、W ₂ And W is ₃ For weighting weight parameters, W ₁ +W ₂ +W ₃ ＝1；

2. The method for processing enterprise information and policy information according to claim 1, wherein said performing the enterprise condition identification process according to the first policy information and the first enterprise information specifically comprises:

3. The method for processing enterprise information and policy information according to claim 1, wherein said generating third matching degree data according to the first feature data set, the second feature data set and the first region data, using a linear regression model corresponding to a region, performs matching degree calculation processing of enterprise information and policy information, and specifically comprises:

4. An apparatus for implementing the method for processing enterprise information and policy information as claimed in any one of claims 1-3, comprising:

5. An electronic device, comprising: memory, processor, and transceiver;

the processor being adapted to couple with the memory, read and execute instructions in the memory to implement the method of any one of claims 1-3;

6. A computer readable storage medium storing computer instructions which, when executed by a computer, cause the computer to perform the method of any one of claims 1-3.