CN113591979A - Industry category identification method, equipment, medium and computer program product - Google Patents

Industry category identification method, equipment, medium and computer program product Download PDF

Info

Publication number
CN113591979A
CN113591979A CN202110868628.3A CN202110868628A CN113591979A CN 113591979 A CN113591979 A CN 113591979A CN 202110868628 A CN202110868628 A CN 202110868628A CN 113591979 A CN113591979 A CN 113591979A
Authority
CN
China
Prior art keywords
category
classification model
name
result
operation range
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110868628.3A
Other languages
Chinese (zh)
Inventor
张鹏
陈婷
吴三平
庄伟亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Original Assignee
WeBank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co Ltd filed Critical WeBank Co Ltd
Priority to CN202110868628.3A priority Critical patent/CN113591979A/en
Publication of CN113591979A publication Critical patent/CN113591979A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Game Theory and Decision Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses an industry category identification method, equipment, a medium and a computer program product, wherein the industry category identification method comprises the following steps: the method comprises the steps of obtaining an operation range and an enterprise name of an enterprise to be classified, inputting the operation range and the enterprise name into a first-layer category classification model of a corresponding hierarchy classification model respectively, and obtaining a classification prediction result, wherein the hierarchy classification model comprises each layer of category classification model, combining the classification prediction result with the operation range and the enterprise name respectively to serve as the input of a next-layer category classification model corresponding to the operation range and the enterprise name respectively, obtaining a next classification prediction result, circulating until the last-layer category classification model is reached, obtaining a target subclass identification result, and determining an industry category corresponding to the enterprise to be classified based on the target subclass identification result and a preset subclass system mapping relation. The technical problem that industry classification efficiency is low is solved in this application.

Description

Industry category identification method, equipment, medium and computer program product
Technical Field
The present application relates to the field of machine learning techniques for financial technology (Fintech), and in particular, to a method, apparatus, medium, and computer program product for industry category identification.
Background
With the continuous development of financial science and technology, especially internet science and technology, more and more technologies (such as distributed technology, artificial intelligence and the like) are applied to the financial field, but the financial industry also puts higher requirements on the technologies, for example, higher requirements on the distribution of backlog in the financial industry are also put forward.
With the development of computer technology, machine learning is more and more widely applied, at present, in the classification standard of the national economy industry, the national economy industry is divided into four classes of categories, namely a door category, a large category, a middle category and a small category, when an enterprise registers at the industrial and commercial level, corresponding industry categories need to be filled in, at present, when the national economy industry categories of the enterprise are missing or wrong, corresponding category paths are often manually searched in a national industry classification system according to the text description of enterprise names and operation ranges, however, when the number of enterprises with missing industry categories is large, the workload of a traditional manual judgment method is large, long time is needed, and further the industry classification efficiency is low.
Disclosure of Invention
The present application mainly aims to provide an industry category identification method, device, medium, and computer program product, and aims to solve the technical problem of low industry classification efficiency in the prior art.
In order to achieve the above object, the present application provides an industry category identification method, including:
acquiring the operation range and the enterprise name of an enterprise to be classified;
inputting the operation range and the enterprise name into a first-layer category classification model of a corresponding hierarchical classification model respectively to obtain a classification prediction result, wherein the hierarchical classification model comprises each layer of category classification model;
combining the classification prediction results with the operation range and the enterprise name respectively to serve as the input of a next-layer category classification model corresponding to the operation range and the enterprise name respectively, obtaining a next classification prediction result, and circulating until the last-layer category classification model is reached to obtain a target subclass identification result;
and determining the industry categories corresponding to the enterprises to be classified based on the target subclass identification result and a preset category system mapping relation.
The application also provides an industry category recognition device, industry category recognition device is virtual device, industry category recognition device includes:
the acquisition module is used for acquiring the operation range and the enterprise name of the enterprise to be classified;
the first classification module is used for inputting the operation range and the enterprise name into a first-layer category classification model of a corresponding hierarchical classification model respectively to obtain a classification prediction result, wherein the hierarchical classification model comprises each layer of category classification model;
the second classification module is used for combining the classification prediction results with the operation range and the enterprise name respectively to be used as the input of a next-layer classification model corresponding to the operation range and the enterprise name respectively to obtain a next classification prediction result, and the operation range and the enterprise name are cycled until the last-layer classification model is reached to obtain a target subclass identification result;
and the determining module is used for determining the industry categories corresponding to the enterprises to be classified based on the target subclass identification result and a preset category system mapping relation.
The application further provides an industry category identification device, industry category identification device is entity device, industry category identification device includes: the system comprises a memory, a processor and an industry category identification program stored on the memory, wherein the industry category identification program is executed by the processor to realize the industry category identification method.
The application also provides a medium which is a readable storage medium, wherein an industry category identification program is stored on the readable storage medium, and the industry category identification program is executed by a processor to realize the steps of the industry category identification method.
The present application also provides a computer program product comprising a computer program which, when executed by a processor, performs the steps of the industry category identification method as described above.
The application provides an industry category identification method, equipment, medium and computer program product, compared with the technical means of manually searching corresponding industry categories in a national industry classification system according to the textual description of enterprise names and operation ranges adopted by the prior art, the application firstly obtains the operation ranges and enterprise names of the enterprises to be classified, and then respectively inputs the operation ranges and the enterprise names into the first-layer category classification models of the corresponding hierarchical classification models to obtain classification prediction results, wherein the hierarchical classification models comprise the category classification models of each layer, further, the classification prediction results are respectively combined with the operation ranges and the enterprise names to be used as the input of the category classification models of the next layer corresponding to the operation ranges and the enterprise names respectively to obtain the next classification prediction results, the process is circulated until reaching the last layer of category classification model to obtain the target subclass identification result, wherein the industry categories comprise four-level category systems of door categories, large categories, middle categories and small categories, each layer of category classification model in the hierarchical classification model judges the output of the next layer according to the classification prediction result corresponding to the previous layer of category classification model, a hierarchical prediction method is adopted to predict a target range more intensively, the search space of the industry categories is reduced, the speed and the accuracy of classification prediction are improved, furthermore, the industry categories corresponding to the enterprise to be classified are determined based on the target subclass identification result and the mapping relation of a preset category system, the automatic obtaining of the industry categories corresponding to the enterprise to be classified is realized based on the operation range and the enterprise name, namely, manual query and classification are not needed, and the problem that in the prior art when more enterprises with missing industry categories is overcome, the traditional manual distinguishing method is large in workload and long in time, so that the technical defect of low industry classification efficiency is caused, and the industry classification efficiency is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious to those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
FIG. 1 is a schematic flow chart diagram illustrating a first embodiment of a method for identifying categories in the industry of the present application;
FIG. 2 is a schematic flow chart diagram illustrating a second embodiment of the industry category identification method of the present application;
FIG. 3 is a schematic flow chart diagram illustrating a third embodiment of the industry category identification method of the present application;
fig. 4 is a schematic structural diagram of an industry category identification device of a hardware operating environment related to the industry category identification method in the embodiment of the present application.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
In a first embodiment of the industry category identification method of the present application, referring to fig. 1, the industry category identification method includes:
step S10, acquiring the operation range and the enterprise name of the enterprise to be classified;
in this embodiment, it should be noted that, the national economic industry classification standard specifies classification and codes of social economic activities, the standard adopts a line classification method and a hierarchical coding method, the national economic industry is divided into four-level category systems of a door category, a major category, a middle category and a minor category, when an enterprise registers at a business level, the enterprise needs to fill in the corresponding category, and the operation range is the range of production, operation and service projects that the enterprise to be classified can work on.
The method comprises the steps of obtaining the operation range and the enterprise name of an enterprise to be classified, and specifically obtaining text information corresponding to the operation range of the enterprise to be classified and text information corresponding to the enterprise name of the enterprise to be classified.
Step S20, inputting the operation range and the enterprise name into a first-layer category classification model of a corresponding hierarchy classification model respectively to obtain a classification prediction result, wherein the hierarchy classification model comprises each layer of category classification model;
in this embodiment, it should be noted that the hierarchical classification model includes an operation range hierarchical classification model and a name hierarchical classification model, where the operation range hierarchical classification model and the name hierarchical classification model both include multiple layers of category classification models, and the category classification models corresponding to the respective layers are used for predicting text multi-classification models of industry category information of corresponding levels.
Inputting the operation range and the enterprise name into a first-layer category classification model of a corresponding hierarchy classification model respectively to obtain a classification prediction result, wherein the hierarchy classification model comprises each layer category classification model, specifically, performing iterative training optimization on the first-layer category classification model in the hierarchy model of the operation range to be trained through a pre-collected operation range to be trained to obtain a first-layer operation category classification model, outputting a training operation range prediction result, performing iterative training optimization on the next-layer category classification model in the hierarchy model of the operation range to be trained through the training operation range prediction result output by an upper-layer model and the operation range to be trained until the last-layer category classification model is trained, and obtaining the operation range hierarchy model, and the training process of the name hierarchy classification model can refer to the specific content of the training process of the operation range hierarchy model, further, the operation range is input into a first-layer category classification model in the operation range hierarchical classification model, an operation range gate probability result is output, the enterprise name is input into the first-layer category classification model in the name hierarchical classification model, a name gate probability result is output, the operation range gate probability result and the name gate probability result are respectively sequenced to obtain an operation range gate probability sequencing result and a name gate probability sequencing result, the operation range gate probability sequencing result and the name gate probability sequencing result are added to obtain a gate combination result, category selection is carried out according to a preset selection rule in the category combination result or the operation range gate probability result to obtain the classification prediction result, the preset selection rule is a rule for selecting a classification with a small ranking sequence value from the gate combination results or selecting a classification with a maximum probability corresponding to the gate probability results in the operation range.
Wherein the hierarchical classification model comprises an operation range hierarchical classification model and a name hierarchical classification model,
the step of inputting the operation range and the enterprise name into a first-level category classification model of a corresponding level classification model respectively to obtain a classification prediction result comprises the following steps:
step S21, inputting the operation range into a first-level category classification model in the operation range hierarchical classification model, and outputting an operation range gate probability result;
in this embodiment, the operation range gate class probability result is a result of the probability that the operation range corresponds to the gate class.
Inputting the operating range into a first-layer category classification model in the operating range hierarchical classification model, and outputting an operating range gate probability result, specifically, constructing text information corresponding to the operating range into an operating range text vector based on a word frequency-inverse document frequency model or a word2vec model, and classifying the operating range text vector through the first-layer category classification model in the operating range hierarchical classification model to obtain an operating range gate probability result corresponding to the operating range text vector.
Step S22, inputting the enterprise name into a first-level category classification model in the name-level classification model, and outputting a name gate probability result;
in this embodiment, the enterprise name is input into the first-level category classification model in the name-level classification model, and a name gate probability result is output, specifically, based on a word frequency-inverse document frequency model or a word2vec model, text information corresponding to the enterprise name is constructed into an enterprise name text vector, and then the enterprise name text vector is classified by the first-level category classification model in the name-level classification model, so as to obtain a name gate probability result corresponding to the enterprise name text vector, and the specific implementation content of step S22 may refer to the specific content in step S21, which is not described herein again.
Step S23, merging the operation range gate probability result and the name gate probability result to obtain a gate merging result;
in this embodiment, the operation range gate probability result and the name gate probability result are merged to obtain a gate merged result, and specifically, each classification probability in the operation range gate probability result is sorted to obtain a first sorted result, each classification probability in the name gate probability result is sorted to obtain a second sorted result, and the first sorted result and the second sorted result are added to obtain the gate merged result, for example, the probabilities of the a, B, and C in the operation range gate probability result are (0.15,0.8,0.05), the sorted results are (2,1,3), the probabilities of the a, B, and C in the name gate probability result are (0.1,0.4,0.5), the sorted results are (3,2,1), the sorted results are added to (5,3,4), the gate combination result can be obtained.
The step of merging the operation range gate type probability result and the name gate type probability result to obtain a category merging result comprises the following steps:
s231, sequencing the operation range gate type probability results to obtain a first sequencing result;
in this embodiment, the operation range gate probability results are sorted to obtain a first sorting result, specifically, the probabilities corresponding to the categories in the operation range gate probability results are sorted from large to small to obtain the first sorting result, for example, it is assumed that the operation range gate probability results are (0.15,0.8,0.05), and the sorting results are (2,1, 3).
Step S232, sorting the name gate class probability results to obtain a second sorting result;
in this embodiment, the name gate class probability results are ranked to obtain a second ranking result, and specifically, the probabilities corresponding to the categories in the operation range gate class probability results are ranked from large to small to obtain the second ranking result.
Step S233, add the first sorting result and the second sorting result to obtain the category merging result.
In this embodiment, the first sorting result and the second sorting result are added to obtain the category merging result, and specifically, the sorting result corresponding to each classification probability in the operation range gate type probability result and the sorting result corresponding to each classification probability in the name gate type probability result are added to obtain the category merging result.
And step S24, performing category selection in the category merging result or the operation range gate category probability result according to a preset selection rule to obtain a classification prediction result.
In this embodiment, category selection is performed in the category merging result according to a preset selection rule to obtain a classification prediction result, specifically, a category with a small rank order value is selected as the classification prediction result in the category merging result, and if the category merging result has a plurality of minimum values, a category with a largest probability is selected as the classification prediction result in the operation range gate probability result, for example, the operation range gate probability result is (0.15,0.8,0.05), the ranking result is (2,1,3), the name gate probability result is (0.1,0.4,0.5), the ranking result is (3,2,1), the ranking addition result is (5,3,4), and then a category with the ranking addition result corresponding to 3 is selected.
Step S30, combining the classification prediction results with the operation range and the enterprise name respectively to be used as the input of the next-layer classification model corresponding to the operation range and the enterprise name respectively, obtaining the next classification prediction results, and circulating until the last-layer classification model is reached to obtain the target subclass identification result;
in this embodiment, the classification prediction results are respectively combined with the operation range and the enterprise name to be used as input of a next-layer category classification model corresponding to the operation range and the enterprise name, so as to obtain a next classification prediction result, the process is circulated until a last-layer category classification model is reached, so as to obtain a target subclass identification result, specifically, the operation range and the classification prediction result are input into a next-layer category classification model in the operation range hierarchical model according to a classification prediction result obtained by a previous-layer category classification model in the hierarchical classification model, the enterprise name and the classification prediction result are input into a next-layer category classification model in the name hierarchical classification model, so as to obtain the next-layer prediction result, so as to predict the next-layer category classification model according to the next-layer prediction result, and circulating until the last layer of category classification model is reached, and obtaining the target subclass identification result.
And step S40, determining the industry category corresponding to the enterprise to be classified based on the target subclass identification result and the preset category system mapping relation.
In this embodiment, it should be noted that in the national economic industry classification standard, the door classes are the first class in the national economic industry classification standard, each door class is further subdivided according to the major class, the major class is the second class in the national economic industry classification standard, and is the subdivision of the door class, each major class is further subdivided according to the middle class, the middle class is the third class in the national economic industry classification standard, and is the subdivision of the major class, the middle class codes are represented by three digits, the first two digits are codes corresponding to the major class, the third digit is a middle class sequential code, each middle class is further subdivided according to the minor class, the minor class is the fourth class in the national economic industry classification standard, and is the subdivision of the middle class, and is the last class, the minor class codes are represented by four digits, the first three digits are codes corresponding to the middle class, and the fourth digit is a minor class sequential code, the preset category system mapping relationship is a mapping relationship between each category system in the industry category, and comprises a mapping relationship corresponding to a subclass and a middle class, a mapping relationship corresponding to a middle class and a large class, a mapping relationship corresponding to a large class and a gate class and the like.
And determining the industry category corresponding to the enterprise to be classified based on the target subclass identification result and a preset category system mapping relation, and specifically determining the industry category corresponding to the enterprise to be classified based on the target subclass identification result and the mapping relation between the subclass and the middle class, the mapping relation between the middle class and the large class, and the mapping relation between the large class and the door class.
Compared with the technical means of manually searching the corresponding industry category in a national industry classification system according to the textual description of the enterprise name and the operation range adopted by the prior art, the embodiment of the application firstly acquires the operation range and the enterprise name of an enterprise to be classified, and then respectively inputs the operation range and the enterprise name into the first-layer category classification model of the corresponding hierarchical classification model to obtain the classification prediction result, wherein the hierarchical classification model comprises the category classification models of all layers, and further, the classification prediction result is respectively combined with the operation range and the enterprise name to be used as the input of the next-layer category classification model corresponding to the operation range and the enterprise name respectively to obtain the next-layer classification prediction result, and the process is circulated until the last-layer category classification model is reached, obtaining target subclass identification results, wherein the industry classes comprise four-level class systems of a door class, a large class, a middle class and a small class, each layer of class classification model in the hierarchical classification model judges the output of the next layer according to the classification prediction result corresponding to the previous layer of class classification model, a hierarchical prediction method is adopted, the prediction target range is concentrated, the search space of the industry classes is reduced, the speed and the accuracy of classification prediction are improved, further, the industry classes corresponding to the enterprise to be classified are determined based on the target subclass identification result and the mapping relation of a preset class system, the industry classes corresponding to the enterprise to be classified are automatically obtained based on the operation range and the enterprise name, namely, manual query and classification are not needed, and the problem that in the prior art when more enterprises with missing industry classes is overcome, the traditional manual distinguishing method is large in workload and long in time, so that the technical defect of low industry classification efficiency is caused, and the industry classification efficiency is improved.
Further, referring to FIG. 2, in another embodiment of the present application, based on the first embodiment of the present application, the hierarchical classification model includes a business segment hierarchical classification model and a name hierarchical classification model,
combining the classification prediction results with the operation range and the enterprise name respectively to be used as input of a next-layer category classification model corresponding to the operation range and the enterprise name respectively, obtaining a next classification prediction result, circulating until a last-layer category classification model is reached, and obtaining a target subclass identification result, wherein the step of obtaining the target subclass identification result comprises the following steps:
step A10, inputting the operation range and the classification prediction result into a second-layer category classification model in the operation range classification model, and outputting a large-category probability result of the operation range;
in this embodiment, the operation range and the classification prediction result are input into the second-layer category classification model in the operation range classification model, and the operation-range large-category probability result is output, specifically, the operation-range large-category probability result corresponding to the second-layer category classification model is obtained by using the classification prediction result output by the first-layer category classification model in the operation range classification model and the operation range as the input of the second-layer category classification model in the operation range classification model.
Step A20, inputting the enterprise name and the classification prediction result into a second-layer category classification model in the name classification model, and outputting a name large-category probability result;
in this embodiment, the enterprise name and the classification prediction result are input into the second-layer category classification model in the name classification model, and a name high-category probability result is output, specifically, the name high-category probability result corresponding to the second-layer category classification model is obtained by using the classification prediction result output by the first-layer category classification model in the name classification model and the enterprise name as input of the second-layer category classification model in the name classification model.
Step A30, merging the operation range large-class probability result and the name large-class probability result to obtain a large-class merging result;
in this embodiment, the operation range large-class probability result and the name large-class probability result are merged to obtain a large-class merged result, specifically, the operation range large-class probability result is sorted to obtain a third sorting result, the name large-class probability result is sorted to obtain a fourth sorting result, and the third sorting result and the fourth sorting result are added to obtain the large-class merged result, and the specific implementation content of step a30 may refer to the specific content in step S23, which is not described herein again.
And A40, performing category selection in the large category combination result or the large category probability result of the operation range according to a preset selection rule to obtain the next category prediction result, and circulating until the last layer of category classification model is reached to obtain a target small category identification result.
In this embodiment, specifically, a category with a small rank order value is selected from the large category merging results as the classification prediction result of the corresponding hierarchy, and if the large category merging results have a plurality of minimum values, a category with a maximum probability is selected from the large category probability results of the business range as the classification prediction result of the corresponding hierarchy, so that the category prediction result of the corresponding hierarchy, the business range, and the enterprise name are input into the next-level category classification model of the hierarchy classification model corresponding to each other, until the target small category identification result is output.
The embodiment of the application provides an industry category identification method, namely, the operation range and the classification prediction result are input into a second-layer category classification model in the operation range classification model, an operation range large-category probability result is output, the enterprise name and the classification prediction result are input into a second-layer category classification model in the name classification model, a name large-category probability result is output, the operation range large-category probability result and the name large-category probability result are combined to obtain a large-category combined result, further, category selection is carried out in the large-category combined result or the operation range large-category probability result according to a preset selection rule to obtain the next category prediction result, the operation range large-category probability result and the name large-category probability result are cycled until the last-layer category classification model is reached to obtain a target small-category identification result, and the hierarchical classification model is passed, the enterprise to be classified is predicted according to the hierarchical sequence of categories, major categories, middle categories and minor categories, each layer of category classification model in the hierarchical classification models judges the output of the next layer according to the classification prediction result corresponding to the category classification model of the previous layer, and finally the target minor category recognition result is obtained, so that the predicted target range is concentrated, the efficiency of prediction and classification is higher, and a foundation is laid for overcoming the technical defect that the industrial classification efficiency is lower due to the fact that the workload is large and the time is long when the enterprise with industrial category loss is more in the prior art by using the traditional manual judgment method.
Further, referring to fig. 3, in another embodiment of the present application, based on the first embodiment of the present application, before the step of inputting the business scope and the business name into the first hierarchical classification model of the corresponding hierarchical classification model respectively, and obtaining the classification prediction result, the business category identification method further includes:
step B10, acquiring sample enterprise names corresponding to the categories of each level in the industry classification and sample operation ranges corresponding to the categories of each level;
in this embodiment, the sample enterprise names and the sample operating ranges respectively corresponding to the categories of the various levels in the industry classification are obtained, and specifically, the classification information of the latest industry, that is, the sample enterprise names and the sample operating ranges respectively corresponding to the categories, the major categories, the middle categories, and the minor categories, are obtained according to the latest version of the national industry classification standard.
Step B20, carrying out data cleaning on the sample enterprise name and the sample operation range to obtain a training enterprise name and a training operation range;
in this embodiment, data cleaning is performed on the sample business name and the sample business range to obtain a training business name and a training business range, and specifically, data information such as a place name and a fixed suffix in the sample business name is removed to prevent a model from being over-fit during training, and further, a non-business range text description part in the sample business range, such as a description of a national regulation, is removed.
Step B30, acquiring a to-be-trained operation range level model and a to-be-trained name level model;
step B40, performing iterative training optimization on the hierarchical model of the to-be-trained operating range through the training operating range to obtain a hierarchical classification model corresponding to the training operating range;
in this embodiment, the hierarchical model of the to-be-trained operating range is iteratively trained and optimized through the training operating range to obtain the hierarchical classification model corresponding to the training operating range, specifically, a first-layer category classification model in the hierarchical model of the to-be-trained operating range is iteratively trained through the training operating range to obtain a first-layer category classification model, and a training prediction result of a first layer is output, so that the training operating range and the training prediction result of the first layer are used as inputs of a next-layer category classification model in the hierarchical model of the to-be-trained operating range, and iterative training is performed on the next-layer category classification model in the operating range model to obtain a category classification model of a corresponding layer until a last-layer category classification model is trained.
The step of performing iterative training optimization on the hierarchical model of the to-be-trained operation range through the training operation range to obtain the hierarchical classification model corresponding to the training operation range comprises the following steps of:
step B41, performing iterative training on a first-layer operation category classification model in the operation range level model to be trained through the training operation range to obtain a first-layer operation category classification model, and outputting a training operation range prediction result;
in this embodiment, a first-layer category classification model in the hierarchical model of the operating range to be trained is iteratively trained through the training operating range to obtain a first-layer operating category classification model, and a training operating range prediction result is output, specifically, the training operating range is input into the first-layer operating category classification model in the hierarchical model of the operating range to be trained to optimize the first-layer operating category classification model, and then it is determined whether the first-layer operating category classification model satisfies a preset training end condition, where the preset training end condition includes conditions such as loss function convergence and reaching a maximum iteration number threshold, if so, the first-layer operating category classification model is obtained, and a training operating range prediction result is output, and if not, the execution step is returned: and performing iterative training on the first-layer operation category classification model in the operation range level model to be trained through the training operation range to obtain a first-layer operation category classification model.
And B42, performing iterative training optimization on the next-layer operation category classification model based on the training operation range prediction result and the training operation range to obtain an operation category classification model of a corresponding layer, outputting the next-training operation range prediction result, and circulating until the last-layer operation category classification model is trained to obtain a hierarchical classification model corresponding to the training operation range.
In this embodiment, specifically, iterative training is performed on the next-layer operation category classification model through the training operation range prediction result output by the first-layer operation category classification model in the operation range-to-be-trained hierarchical model and the training operation range, so as to optimize the next-layer operation category classification model in the operation range-to-be-trained hierarchical model until a preset training end condition is met, obtain a category classification model of a corresponding level, and output the next-training operation range prediction result until the last-layer operation category classification model of the operation range-to-be-trained hierarchical model is trained, thereby obtaining a hierarchical classification model corresponding to the training operation range.
And step B50, performing iterative training optimization on the name hierarchical model to be trained through the training enterprise name to obtain a hierarchical classification model corresponding to the training enterprise name.
In this embodiment, the training enterprise name is used to perform iterative training optimization on the hierarchical model of the name to be trained, so as to obtain a hierarchical classification model corresponding to the training enterprise name, specifically, the training enterprise name is used to perform iterative training on a first-layer category classification model in the hierarchical model of the name to be trained, so as to obtain a first-layer category classification model, and output a training name prediction result, and then the training operation range and the training prediction result of the first layer are used as inputs of a next-layer category classification model in the hierarchical model of the operation range to be trained, so as to perform iterative training on the next-layer category classification model in the operation range model, so as to obtain a category classification model of a corresponding level, until the last-layer category classification model is trained.
The step of performing iterative training optimization on the hierarchical model of the name to be trained through the training enterprise name to obtain the hierarchical classification model corresponding to the training enterprise name comprises the following steps:
step B51, performing iterative training on a first-layer name category classification model in the name hierarchy model to be trained through the training enterprise name to obtain a first-layer name category classification model, and outputting a training name prediction result;
in this embodiment, a first-layer name category classification model in the to-be-trained name hierarchy model is iteratively trained by the training enterprise name to obtain a first-layer name category classification model, and a training name prediction result is output, specifically, the training enterprise name is input into the first-layer name category classification model in the to-be-trained name hierarchy model to optimize the first-layer name category classification model, and then whether the first-layer name category classification model meets a preset training end condition is determined, if yes, the first-layer name category classification model is obtained, and a training name prediction result is output, and if not, the execution step is returned: and performing iterative training on the first-layer name category classification model in the to-be-trained name hierarchy model through the training enterprise names to obtain a first-layer name category classification model, and outputting a training name prediction result, wherein specific implementation contents of the step B51 can refer to specific contents in the step B41, and are not repeated here.
And B52, performing iterative training optimization on the next-layer name category classification model based on the training name prediction result and the training enterprise name to obtain a name category classification model of a corresponding layer, outputting the next training name prediction result, and circulating until the last-layer name category classification model is trained to obtain a hierarchical classification model corresponding to the training enterprise name.
In this embodiment, specifically, iterative training is performed on the next-layer name category classification model in the to-be-trained name hierarchy model through the training name prediction result output by the first-layer name category classification model in the to-be-trained name hierarchy model and the training enterprise name, so as to optimize the next-layer name category classification model in the to-be-trained name hierarchy model until a preset training end condition is met, obtain a name category classification model of a corresponding level until the last-layer name category classification model is trained, and further obtain a hierarchy classification model corresponding to the training enterprise name, where specific implementation contents of step B52 may refer to specific contents in step B42, and are not described herein again.
The embodiment of the application provides an industry category identification method, namely, a sample enterprise name corresponding to each category and a sample operation range corresponding to each category in an industry category are obtained, then the sample enterprise name and the sample operation range are subjected to data cleaning to obtain a training enterprise name and a training operation range, further, a to-be-trained operation range hierarchical model and a to-be-trained name hierarchical model are obtained, further, the to-be-trained operation range hierarchical model is subjected to iterative training optimization through the training operation range to obtain a hierarchical classification model corresponding to the training operation range, further, the to-be-trained name hierarchical model is subjected to iterative training optimization through the training enterprise name to obtain a hierarchical classification model corresponding to the training enterprise name, and a hierarchical training method is adopted in a training process, the classification method is characterized in that each level only judges the category of the next level corresponding to the level, so that the search space of the category is reduced in the prediction process of the enterprise to be classified through the level classification model, the steel industry classification efficiency is improved, and the foundation is laid for overcoming the technical defects that in the prior art, when the enterprises with lost industry categories are more, the workload is large, the time is long, and the industry classification efficiency is low due to the fact that the traditional manual judgment method is used.
Referring to fig. 4, fig. 4 is a schematic structural diagram of an industry category identification device of a hardware operating environment according to an embodiment of the present application.
As shown in fig. 4, the industry category identification device may include: a processor 1001, such as a CPU, a memory 1005, and a communication bus 1002. The communication bus 1002 is used for realizing connection communication between the processor 1001 and the memory 1005. The memory 1005 may be a high-speed RBM memory or a non-volatile memory (non-volatile memory) such as a magnetic disk memory. The memory 1005 may alternatively be a memory device separate from the processor 1001 described above.
Optionally, the industry category identification device may also include a rectangular user interface, a network interface, a camera, RF (radio Frequency) circuitry, sensors, audio circuitry, a WiFi module, and so forth. The rectangular user interface may comprise a display screen (DisplBy), an input sub-module such as a keyboard (KeyboBrd), and the optional rectangular user interface may also comprise a standard wired interface, a wireless interface. The network interface may optionally include a standard wired interface, a wireless interface (e.g., WIFI interface).
Those skilled in the art will appreciate that the industry category identification device configuration shown in fig. 4 does not constitute a limitation of the industry category identification device and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.
As shown in fig. 4, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, and an industry category identification program. The operating system is a program for managing and controlling hardware and software resources of the industry category identification device, and supports the operation of the industry category identification program and other software and/or programs. The network communication module is used to enable communication between the various components within the memory 1005, as well as with other hardware and software in the industry category identification system.
In the industry category identification device shown in fig. 4, the processor 1001 is configured to execute an industry category identification program stored in the memory 1005, and implement the steps of the industry category identification method described in any one of the above.
The specific implementation of the industry category identification device in the present application is substantially the same as that of each embodiment of the industry category identification method, and is not described herein again.
The application also provides an industry category identification device, industry category identification device includes:
the acquisition module is used for acquiring the operation range and the enterprise name of the enterprise to be classified;
the first classification module is used for inputting the operation range and the enterprise name into a first-layer category classification model of a corresponding hierarchical classification model respectively to obtain a classification prediction result, wherein the hierarchical classification model comprises each layer of category classification model;
the second classification module is used for combining the classification prediction results with the operation range and the enterprise name respectively to be used as the input of a next-layer classification model corresponding to the operation range and the enterprise name respectively to obtain a next classification prediction result, and the operation range and the enterprise name are cycled until the last-layer classification model is reached to obtain a target subclass identification result;
and the determining module is used for determining the industry categories corresponding to the enterprises to be classified based on the target subclass identification result and a preset category system mapping relation.
Optionally, the first classification module is further configured to:
inputting the operation range into a first-layer category classification model in the operation range hierarchical classification model, and outputting an operation range gate probability result;
inputting the enterprise name into a first-layer category classification model in the name-level classification model, and outputting a name gate probability result;
merging the operation range gate type probability result and the name gate type probability result to obtain a gate type merging result;
and performing category selection in the category merging result or the operation range gate probability result according to a preset selection rule to obtain a classification prediction result.
Optionally, the first classification module is further configured to:
sorting the operation range gate probability results to obtain a first sorting result;
sorting the name gate class probability results to obtain a second sorting result;
and adding the first sequencing result and the second sequencing result to obtain the category merging result.
Optionally, the second classification module is further configured to:
inputting the operation range and the classification prediction result into a second-layer category classification model in the operation range classification model, and outputting a large-category probability result of the operation range;
inputting the enterprise name and the classification prediction result into a second-layer category classification model in the name classification model, and outputting a name large-category probability result;
merging the operation range large-class probability result and the name large-class probability result to obtain a large-class merging result;
and performing category selection according to a preset selection rule in the large category combination result or the large category probability result of the operation range to obtain the next category prediction result, and circulating until the last layer of category classification model is reached to obtain a target small category identification result.
Optionally, the industry category identifying device is further configured to:
acquiring sample enterprise names corresponding to all levels of categories in the industry classification and sample operation ranges corresponding to all levels of categories;
carrying out data cleaning on the sample enterprise name and the sample operation range to obtain a training enterprise name and a training operation range;
acquiring a to-be-trained operation range level model and a to-be-trained name level model;
performing iterative training optimization on the hierarchical model of the operating range to be trained through the training operating range to obtain a hierarchical classification model corresponding to the training operating range;
and carrying out iterative training optimization on the hierarchical model of the name to be trained through the name of the training enterprise to obtain a hierarchical classification model corresponding to the name of the training enterprise.
Optionally, the industry category identifying device is further configured to:
performing iterative training on a first-layer category classification model in the to-be-trained operation range level model through the training operation range to obtain a first-layer operation category classification model, and outputting a training operation range prediction result;
and performing iterative training optimization on the next-layer operation category classification model based on the training operation range prediction result and the training operation range to obtain an operation category classification model of a corresponding level, outputting a next-training operation range prediction result, and circulating until the last-layer operation category classification model is trained to obtain a level classification model corresponding to the training operation range.
Optionally, the industry category identifying device is further configured to:
performing iterative training on a first-layer category classification model in the name hierarchy model to be trained through the training enterprise names to obtain a first-layer name category classification model, and outputting a training name prediction result;
and performing iterative training optimization on the next layer of name category classification model based on the training name prediction result and the training enterprise name to obtain a corresponding layer of name category classification model, outputting the next training name prediction result, and circulating until the last layer of name category classification model is trained to obtain a layer classification model corresponding to the training enterprise name.
The specific implementation of the industry category identification device of the present application is substantially the same as that of each embodiment of the industry category identification method, and is not described herein again.
The present application provides a medium, which is a readable storage medium, and the readable storage medium stores one or more programs, which can be further executed by one or more processors to implement the steps of any one of the industry category identification methods described above.
The specific implementation manner of the readable storage medium of the present application is substantially the same as that of each embodiment of the industry category identification method, and is not described herein again.
The present application provides a computer program product, and the computer program product includes one or more computer programs, which can also be executed by one or more processors for implementing the steps of any one of the industry category identification methods described above.
The specific implementation of the computer program product of the present application is substantially the same as that of each embodiment of the industry category identification method, and is not described herein again.
The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings, or which are directly or indirectly applied to other related technical fields, are included in the scope of the present application.

Claims (10)

1. An industry category identification method, characterized in that the industry category identification method comprises:
acquiring the operation range and the enterprise name of an enterprise to be classified;
inputting the operation range and the enterprise name into a first-layer category classification model of a corresponding hierarchical classification model respectively to obtain a classification prediction result, wherein the hierarchical classification model comprises each layer of category classification model;
combining the classification prediction results with the operation range and the enterprise name respectively to serve as the input of a next-layer category classification model corresponding to the operation range and the enterprise name respectively, obtaining a next classification prediction result, and circulating until the last-layer category classification model is reached to obtain a target subclass identification result;
and determining the industry categories corresponding to the enterprises to be classified based on the target subclass identification result and a preset category system mapping relation.
2. The industry category identification method of claim 1, wherein the hierarchical classification model includes an extent of business hierarchical classification model and a name hierarchical classification model,
the step of inputting the operation range and the enterprise name into a first-level category classification model of a corresponding level classification model respectively to obtain a classification prediction result comprises the following steps:
inputting the operation range into a first-layer category classification model in the operation range hierarchical classification model, and outputting an operation range gate probability result;
inputting the enterprise name into a first-layer category classification model in the name-level classification model, and outputting a name gate probability result;
merging the operation range gate type probability result and the name gate type probability result to obtain a gate type merging result;
and performing category selection in the category merging result or the operation range gate probability result according to a preset selection rule to obtain a classification prediction result.
3. The industry category identification method of claim 2, wherein said step of merging said business segment gate probability results and said name gate probability results to obtain category merged results comprises:
sorting the operation range gate probability results to obtain a first sorting result;
sorting the name gate class probability results to obtain a second sorting result;
and adding the first sequencing result and the second sequencing result to obtain the category merging result.
4. The industry category identification method of claim 1, wherein the hierarchical classification model includes an extent of business hierarchical classification model and a name hierarchical classification model,
the step of combining the classification prediction results with the operation range and the enterprise name respectively to be used as the input of the next-layer category classification model corresponding to the operation range and the enterprise name respectively to obtain the next classification prediction results, and circulating until the last-layer category classification model is reached to obtain the target subclass identification result comprises the following steps:
inputting the operation range and the classification prediction result into a second-layer category classification model in the operation range classification model, and outputting a large-category probability result of the operation range;
inputting the enterprise name and the classification prediction result into a second-layer category classification model in the name classification model, and outputting a name large-category probability result;
merging the operation range large-class probability result and the name large-class probability result to obtain a large-class merging result;
and performing category selection according to a preset selection rule in the large category combination result or the large category probability result of the operation range to obtain the next category prediction result, and circulating until the last layer of category classification model is reached to obtain the target small category identification result.
5. The industry category identification method of claim 1, wherein before entering the business segment and the business name into the first hierarchical classification model of the corresponding hierarchical classification model to obtain the classification prediction result, the industry category identification method further comprises:
acquiring sample enterprise names corresponding to all levels of categories in the industry classification and sample operation ranges corresponding to all levels of categories;
carrying out data cleaning on the sample enterprise name and the sample operation range to obtain a training enterprise name and a training operation range;
acquiring a to-be-trained operation range level model and a to-be-trained name level model;
performing iterative training optimization on the hierarchical model of the operating range to be trained through the training operating range to obtain a hierarchical classification model corresponding to the training operating range;
and carrying out iterative training optimization on the hierarchical model of the name to be trained through the name of the training enterprise to obtain a hierarchical classification model corresponding to the name of the training enterprise.
6. The industry category identification method of claim 5, wherein the step of performing iterative training optimization on the business range-to-be-trained hierarchical model through the training business range to obtain the hierarchical classification model corresponding to the training business range comprises:
performing iterative training on a first-layer operation category classification model in the operation range level model to be trained through the training operation range to obtain a first-layer operation category classification model, and outputting a training operation range prediction result;
and performing iterative training optimization on the next-layer operation category classification model based on the training operation range prediction result and the training operation range to obtain an operation category classification model of a corresponding level, outputting a next-training operation range prediction result, and circulating until the last-layer operation category classification model is trained to obtain a level classification model corresponding to the training operation range.
7. The industry category identification method of claim 5, wherein the step of performing iterative training optimization on the hierarchical model of the name to be trained through the training business name to obtain the hierarchical classification model corresponding to the training business name comprises:
performing iterative training on a first-layer name category classification model in the name level model to be trained through the training enterprise names to obtain a first-layer name category classification model, and outputting a training name prediction result;
and performing iterative training optimization on the next layer of name category classification model based on the training name prediction result and the training enterprise name to obtain a corresponding layer of name category classification model, outputting the next training name prediction result, and circulating until the last layer of name category classification model is trained to obtain a layer classification model corresponding to the training enterprise name.
8. An industry category identification device, characterized in that the industry category identification device comprises: a memory, a processor, and an industry category identification program stored on the memory,
the industry category identification program being executed by the processor for implementing the industry category identification method as claimed in any one of claims 1 to 7.
9. A medium, which is a readable storage medium, characterized in that an industry category identification program is stored on the readable storage medium, and the industry category identification program is executed by a processor to implement the industry category identification method according to any one of claims 1 to 7.
10. A computer program product comprising a computer program, characterized in that the computer program realizes the steps of the industry category identification method according to any one of claims 1 to 7 when being executed by a processor.
CN202110868628.3A 2021-07-30 2021-07-30 Industry category identification method, equipment, medium and computer program product Pending CN113591979A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110868628.3A CN113591979A (en) 2021-07-30 2021-07-30 Industry category identification method, equipment, medium and computer program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110868628.3A CN113591979A (en) 2021-07-30 2021-07-30 Industry category identification method, equipment, medium and computer program product

Publications (1)

Publication Number Publication Date
CN113591979A true CN113591979A (en) 2021-11-02

Family

ID=78252207

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110868628.3A Pending CN113591979A (en) 2021-07-30 2021-07-30 Industry category identification method, equipment, medium and computer program product

Country Status (1)

Country Link
CN (1) CN113591979A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114860892A (en) * 2022-07-06 2022-08-05 腾讯科技(深圳)有限公司 Hierarchical category prediction method, device, equipment and medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114860892A (en) * 2022-07-06 2022-08-05 腾讯科技(深圳)有限公司 Hierarchical category prediction method, device, equipment and medium
CN114860892B (en) * 2022-07-06 2022-09-06 腾讯科技(深圳)有限公司 Hierarchical category prediction method, device, equipment and medium

Similar Documents

Publication Publication Date Title
US11327675B2 (en) Data migration
US20180075357A1 (en) Automated system for development and deployment of heterogeneous predictive models
Geiger et al. Learning effective dispatching rules for batch processor scheduling
CN110782123A (en) Matching method and device of decision scheme, computer equipment and storage medium
US20210216904A1 (en) Knowledge Aided Feature Engineering
US20220245487A1 (en) Feature prediction method, system and engine
US10592507B2 (en) Query processing engine recommendation method and system
US20220207414A1 (en) System performance optimization
CN110780965A (en) Vision-based process automation method, device and readable storage medium
US10679230B2 (en) Associative memory-based project management system
US20140325405A1 (en) Auto-completion of partial line pattern
CN114547072A (en) Method, system, equipment and storage medium for converting natural language query into SQL
CN116594748A (en) Model customization processing method, device, equipment and medium for task
CA2793400C (en) Associative memory-based project management system
CN111784401A (en) Order taking rate prediction method, device, equipment and readable storage medium
CN110310012B (en) Data analysis method, device, equipment and computer readable storage medium
CN113378067B (en) Message recommendation method, device and medium based on user mining
CN113591979A (en) Industry category identification method, equipment, medium and computer program product
CN112860736A (en) Big data query optimization method and device and readable storage medium
CN111858366B (en) Test case generation method, device, equipment and storage medium
CN116820714A (en) Scheduling method, device, equipment and storage medium of computing equipment
CN116226373A (en) Industry classification model training method and enterprise industry classification method
US20220292393A1 (en) Utilizing machine learning models to generate initiative plans
US11803792B2 (en) Risk management
CN115080587A (en) Electronic component replacing method, device and medium based on knowledge graph

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination