CN112163153B - Industry label determining method, device, equipment and storage medium - Google Patents

Industry label determining method, device, equipment and storage medium Download PDF

Info

Publication number
CN112163153B
CN112163153B CN202011060599.XA CN202011060599A CN112163153B CN 112163153 B CN112163153 B CN 112163153B CN 202011060599 A CN202011060599 A CN 202011060599A CN 112163153 B CN112163153 B CN 112163153B
Authority
CN
China
Prior art keywords
enterprise
target
subclass
word
content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011060599.XA
Other languages
Chinese (zh)
Other versions
CN112163153A (en
Inventor
唐圳
刘博�
郑文琛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Original Assignee
WeBank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co Ltd filed Critical WeBank Co Ltd
Priority to CN202011060599.XA priority Critical patent/CN112163153B/en
Publication of CN112163153A publication Critical patent/CN112163153A/en
Priority to PCT/CN2021/103262 priority patent/WO2022068297A1/en
Application granted granted Critical
Publication of CN112163153B publication Critical patent/CN112163153B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Tourism & Hospitality (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Data Mining & Analysis (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method, a device, equipment and a storage medium for determining an industry label, wherein the method comprises the following steps: acquiring the operating range of a target enterprise of the stock user, wherein the type of an industry label of the target enterprise is an unknown label type; aiming at each subclass under the target class, acquiring the operating range of each enterprise of the stock users corresponding to the subclass, generating the subclass operating content of the subclass according to the operating range of each enterprise, wherein the target class is a category or a major class to which an industry label of the target enterprise belongs, and the type of the industry label of the subclass is a known label type; according to the business scope of the target enterprise and each sub-class business content, determining the sub-class business content matched with the business scope of the target enterprise as the matched business content of the target enterprise; and the industry labels of the subclasses corresponding to the matched management content are determined as the industry labels of the target enterprises, so that the specific industry labels are automatically determined for the enterprises according to the management range, and the accuracy of the label determining method is high.

Description

Industry label determining method, device, equipment and storage medium
Technical Field
The present invention relates to the field of text recognition technologies, and in particular, to a method, an apparatus, a device, and a storage medium for determining an industry label.
Background
With the comprehensive development of enterprises, the phenomenon that one enterprise spans multiple industries is more and more, and more industry classification labels of the enterprises are industry-undefined labels, namely, the types of the industry labels are the types of the labels, such as 'other unidentified wholesale industries (5199)', 'other agriculture (0190)', and the like, and the labels cannot clearly describe the operation content of the enterprises.
When the industry label of the enterprise is the unknown label type, the enterprise portrait of the enterprise cannot be accurately determined, so that high-quality service cannot be provided for the enterprise.
Disclosure of Invention
The invention mainly aims to provide a method, a device, equipment and a storage medium for determining an industry label, aiming at an enterprise with an undefined industry label, the industry label is automatically matched with the undefined industry label according to the operation content of the enterprise, the label determining method is high in accuracy, and the method is more suitable for the operation condition of the enterprise, and provides a good basis for the follow-up determination of the enterprise portrait.
In order to achieve the above object, in a first aspect, an embodiment of the present invention provides a method for determining an industry label, where the method for determining an industry label includes:
Acquiring the operating range of a target enterprise of an inventory user, wherein the type of an industry label of the target enterprise is an unknown label type; aiming at each subclass under a target class, acquiring the operating range of each enterprise of the stock user corresponding to the subclass, and generating subclass operating contents of the subclass according to the operating range of each enterprise, wherein the target class is a category or a major class to which an industry label of the target enterprise belongs, and the type of the industry label of the subclass is a known label type; determining the sub-class business content matched with the business scope of the target enterprise as the matched business content of the target enterprise according to the business scope of the target enterprise and the sub-class business content; and determining the industry label of the subclass corresponding to the matched management content as the industry label of the target enterprise.
Optionally, generating the subclass operation content of the subclass according to the operation scope of each enterprise includes:
Performing word segmentation processing on the business scope of each enterprise of each subclass to obtain word segmentation of each business scope of each enterprise; and carrying out duplication elimination processing and stop word elimination processing on enterprise operation range segmentation words of each enterprise of the subclass aiming at each subclass so as to obtain subclass operation contents of the subclass.
Optionally, after obtaining the operating range of the target enterprise of the stock user, the method further comprises:
and performing word segmentation processing on the operation range of the target enterprise to obtain word segmentation of each target operation range of the target enterprise.
Correspondingly, determining the sub-class business content matched with the business scope of the target enterprise as the matched business content of the target enterprise according to the business scope of the target enterprise and the sub-class business content, including:
aiming at each sub-class management content, calculating the matching degree of the sub-class management content and the management scope of the target enterprise according to the business scope segmentation of the sub-class management content and the target business scope segmentation of the sub-class management content; and determining the sub-class business content with the highest matching degree as the matching business content of the target enterprise.
Optionally, for each sub-category operation content, calculating a matching degree between the sub-category operation content and the operation scope of the target enterprise according to each enterprise operation scope word of the sub-category operation content and each target operation scope word, including:
Determining the total business content of the target category according to the business scope word segmentation of each sub-category business content; aiming at each sub-class management content, calculating a first score of each enterprise management scope word of the sub-class management content in the total management content by taking the total management content as a document set based on a word frequency-inverse document frequency technology; and aiming at each sub-class management content, determining the matching degree of the sub-class management content and the management scope of the target enterprise according to each target management scope word and the first score of each enterprise management scope word of the sub-class management content.
Optionally, determining the matching degree of the sub-class operation content and the operation scope of the target enterprise according to the target operation scope word and the first score of the enterprise operation scope word of the sub-class operation content, including:
For each target business scope word, when the target business scope word is matched with the current business scope word of the subclass business content, determining the first score of the current business scope word as the target score of the target business scope word; and determining the matching degree of the sub-class operation content and the operation scope of the target enterprise according to the target score of each target operation scope word segmentation.
Optionally, the method further comprises:
For each enterprise of each subclass, calculating word vectors of words of each enterprise operation range of the enterprise, and determining enterprise operation range sentence vectors of the enterprise according to the word vectors of words of each enterprise operation range; for each subclass, determining an operation range center vector of the subclass according to the enterprise operation range sentence vector of each enterprise of the subclass; calculating word vectors of the word segmentation in each target operation range of the target enterprise, and determining target operation range sentence vectors of the target enterprise according to the word vectors of the word segmentation in each target operation range; and calculating the vector distance between the target operation range sentence vector and the operation range center vector of each subclass.
Correspondingly, determining the matching degree of the sub-class operation content and the operation scope of the target enterprise according to the target operation scope word and the first score of the enterprise operation scope word of the sub-class operation content, including:
Determining a second score of the business scope of the target enterprise corresponding to the subclass business content according to the target business scope word and the first score of the enterprise business scope word of the subclass business content; and determining the matching degree of the subclass operation content of the subclass and the operation range of the target enterprise according to the second score and the vector distance.
Optionally, the calculating the word vector of each business operation scope word of the business includes:
based on the text vectorization model and a preset Chinese word vector dictionary, calculating word vectors of word segmentation in each enterprise operation range of the enterprise.
Correspondingly, the calculating the word vector of each target operation range word of the target enterprise comprises the following steps:
based on a text vectorization model and a preset Chinese word vector dictionary, calculating word vectors of word segmentation in each target business scope of the target enterprise.
In a second aspect, an embodiment of the present invention further provides an apparatus for determining an industry label, including:
The system comprises a data acquisition module, a storage user and a storage user, wherein the data acquisition module is used for acquiring the operating range of a target enterprise of the storage user, and the type of an industry label of the target enterprise is an unknown label type;
A subclass operation content determining module, configured to obtain, for each subclass under a target class, an operation range of each enterprise of the stock user corresponding to the subclass, and generate, according to the operation range of each enterprise, a subclass operation content of the subclass, where the target class is a class or a major class to which an industry label of the target enterprise belongs, and a type of the industry label of the subclass is a known label type;
the content matching module is used for determining that the sub-class business content matched with the business scope of the target enterprise is the matched business content of the target enterprise according to the business scope of the target enterprise and the sub-class business content;
and the industry label determining module is used for determining the industry label of the subclass corresponding to the matched operation content as the industry label of the target enterprise.
In a third aspect, an embodiment of the present invention further provides an industry label determining apparatus, where the industry label determining apparatus includes: the method comprises the steps of realizing the method for determining the industry label according to any embodiment corresponding to the first aspect of the invention when the program for determining the industry label is executed by the processor.
In a fourth aspect, an embodiment of the present invention further provides a computer readable storage medium, where a program for determining an industry label is stored, where the program for determining an industry label, when executed by a processor, implements the steps of the method for determining an industry label according to any embodiment corresponding to the first aspect of the present invention.
According to the method, the device, the equipment and the storage medium for determining the industry label, aiming at the target enterprise with the undefined industry label of the stock user, through the operation range of the target enterprise and the operation content of each subclass of the known label type of the label type under the category or the major class corresponding to the target enterprise, the operation content of the subclass is determined by the operation range of each enterprise corresponding to the subclass, the subclass operation content matched with the operation range of the target enterprise is determined, the subclass of the subclass operation content is determined as the industry label of the target enterprise, the industry label which is defined by the undefined enterprise of the industry label is automatically achieved, the label matching accuracy is high, thereby providing a good foundation for determining the enterprise of the enterprise, being convenient for providing high-quality service conforming to the operation condition of the enterprise for the enterprise, and improving the user experience.
Drawings
FIG. 1 is an application scenario diagram of a method for determining an industry label according to an embodiment of the present invention;
FIG. 2 is a flowchart of a method for determining an industry label provided by an embodiment of the present invention;
FIG. 3 is a flow chart of a method of determining an industry label according to another embodiment of the present invention;
FIG. 4 is a flow chart of step S306 in the embodiment of FIG. 3 according to the present invention;
FIG. 5 is a flow chart of a method of determining an industry label according to another embodiment of the present invention;
FIG. 6 is a schematic structural diagram of an apparatus for determining an industry label according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of an industry label determining apparatus according to an embodiment of the present invention.
The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The following explains the application scenario of the embodiment of the present invention:
Fig. 1 is an application scenario diagram of a method for determining an industry label according to an embodiment of the present invention, as shown in fig. 1, in order to improve quality of electric pin service, it is often necessary to formulate corresponding electric pin service policies according to industry labels of different enterprises. Corresponding industry labels can be determined for each enterprise according to national economic industry classification, wherein the codes of the industry labels are classified into a category, a major category, a middle category and a minor category from large to small. The service enterprise 110 needs to determine the enterprise portraits of the target enterprise 120 according to the industry labels 121 of the target enterprise 120 for service, so as to provide high-quality electric sales services for the target enterprise 120 according to the enterprise portraits.
When the industry label 121 of the target enterprise 120 is an industry label of an unknown label type in a subclass, such as other unknown wholesale industries, the subclass code is 5199, the granularity of the enterprise portrait of the target enterprise 120 is unclear, so that the enterprise portrait cannot accurately describe the requirement of the target enterprise 120, and thus a service policy meeting the requirement of the target enterprise 120 cannot be provided for the target enterprise 120.
In order to improve the definition of enterprise portrait of an industry label undefined enterprise, the embodiment of the invention provides a method for automatically determining an explicit industry label for the industry label undefined enterprise, and the main conception of the industry label determining method is as follows: according to the business scope of the enterprise and the business scope of each enterprise corresponding to each specific subclass of the enterprise with the same major class or class, the business scope of the subclass matched with the business scope of the target enterprise is determined, and the business label of the subclass is determined as the business label of the target enterprise, so that the target enterprise is matched with the proper specific business label, further, based on the specific business label, a clear enterprise portrait of the target enterprise can be generated, and the requirements of the target enterprise can be accurately and closely described based on the enterprise portrait, so that high-quality service can be provided for the target enterprise.
Fig. 2 is a flowchart of a method for determining an industry label according to an embodiment of the present invention, where, as shown in fig. 2, the method for determining an industry label includes the following steps:
Step S201, obtaining the operating range of the target enterprise of the stock user.
The type of the industry label of the target enterprise is an unknown label type. Industry labels generally refer to class names of subclasses in national economic industry classification, and industry labels of unknown label types indicate that other industry labels with ambiguous class expressions are contained in the class names of the corresponding subclasses, such as 'other agriculture', 'other livestock industry', 'other unexplained wholesale industry', 'other unexplained manufacturing industry', and the like. An inventory user refers to a user who adopts a provided service, and generally refers to an existing customer. The business scope is data for describing business scope of enterprises, and can be described by keywords or sentences.
Illustratively, taking the industry label as another unidentified wholesale industry as an example, the operating range of the target enterprise may be: the business areas are wholesale and retail steel and clothing.
Specifically, the number of target enterprises may be one or a plurality.
Further, after obtaining the operating range of the target enterprise, the method further includes:
Clearing the industry label of the target enterprise; converting the operation range into an operation range in a preset format; and performing word segmentation processing on the operation range in the preset format to obtain word segmentation of each target operation range corresponding to the operation range of the target enterprise.
Specifically, to reconfigure an industry label for a target enterprise whose industry label is ambiguous, it is necessary to wash off its existing industry label.
Specifically, the business scope of the target enterprise is usually manually input or filled, so that the format of the business scope is not uniform, and the business scope of the target enterprise is converted into the business scope in a preset format for facilitating data processing.
For example, assume that the target enterprise C1 has an operating range of "home improvement: wholesale and retail of various stationery, jewelry, beverages and tobacco, and converting the preset format of the stationery, jewelry, beverages and tobacco, wherein the operation range of the converted target enterprise C1 is as follows: wholesale and retail stationery, jewelry, beverages and tobacco.
Step S202, for each subclass under the target class, acquiring the operating range of each enterprise of the stock users corresponding to the subclass, and generating the subclass operating content of the subclass according to the operating range of each enterprise.
The target category is a category or a major category to which the industry label of the target enterprise belongs, and the type of the industry label of the minor category is a known label type. The known label type is contrary to the above-mentioned position label type, and the industry label indicating the business is clear or definite, and may be an industry label not including the above-mentioned "unlisted" keyword, such as an industry label of "fruit, vegetable wholesale (5123)", "clothing wholesale (5132)", and the like.
Specifically, the gate class or the major class of the industry label of the target enterprise is obtained, and the operating range of each enterprise corresponding to each subclass of the stock user under the gate class or the major class is obtained, namely, the operating range of each enterprise corresponding to each subclass under the target class is obtained, and then the operating ranges of each enterprise are integrated, so that the subclass operating content of the subclass is obtained.
Further, for each business scope, the content in brackets in the business scope can be removed, and the business scope of the business with the business scope being an abnormal value can be removed, for example, the value of the business scope is null.
Further, after the business content of each enterprise of the subclass is obtained, keywords of the business content of each enterprise can be extracted, and then the keywords of each enterprise form the subclass business content of the subclass.
For example, assume that the major class of the target enterprise is "wholesale industry", the major class code is 51, and that there are enterprise clients in the 2 minor classes of the known label types under the wholesale industry, namely building material wholesale (minor class code 5165) and textile, knitwear and raw material wholesale (minor class code 5131), wherein the enterprises belonging to the minor class of the building material wholesale are enterprises C2 and C3, and the enterprises belonging to the minor class of the textile, knitwear and raw material wholesale are C4, C5 and C6, so as to integrate the operating ranges of the enterprises C2 and C3 to obtain the operating content of the minor class of the building material wholesale, and integrate the operating ranges of the enterprises C4, C5 and C6 to obtain the operating content of the minor class of the textile, knitwear and raw material wholesale.
Step S203, determining, according to the business scope of the target enterprise and each of the sub-class business contents, the sub-class business content matched with the business scope of the target enterprise as the matched business content of the target enterprise.
Specifically, each keyword of the operation range of the target enterprise and each keyword of the sub-class operation content can be matched, so that the matching degree of the target enterprise corresponding to the sub-class is obtained, and the sub-class operation content corresponding to the sub-class with the highest matching degree is determined as the matching operation content of the target enterprise.
Further, weight values can be set for each keyword of the sub-class management content in advance, and then when the keyword of the management scope of the target enterprise is consistent or matched with the keyword of the sub-class management content, the weight value of the matched keyword is obtained, the weight values of the matched keywords are overlapped, and the matching degree corresponding to the sub-class is changed.
Specifically, the weight value of the keyword of the sub-category management content may be determined based on the frequency of occurrence of the keyword.
By way of example, assuming that keywords of the minor business contents of a minor category and weight values are "wholesale 0.1, retail 0.1, steel 0.4, and wood 0.4", and keywords of the business contents of a target business are "wholesale, steel, and clothes", the matching degree of the target business corresponding to the minor category is 0.5.
Step S204, determining the industry label of the subclass corresponding to the matched operation content as the industry label of the target enterprise.
Specifically, after the matching business content which is most matched with the business scope of the target enterprise is determined from the sub-business content of each sub-class, the specific industry label of the sub-class corresponding to the matching business content is obtained, and the industry label of the industry label target enterprise is set, so that the specific industry label is automatically set for the target enterprise.
In this embodiment, for a target enterprise whose industry label is ambiguous for an inventory user, through the operation range of the target enterprise and the operation contents of each subclass of known label types of the class or the label type under the class corresponding to the target enterprise, the operation contents of the subclass are determined by the operation range of each enterprise corresponding to the subclass, the subclass operation contents matched with the operation range of the target enterprise are determined, and the subclass of the subclass operation contents is determined as the industry label of the target enterprise, so that the industry label which is ambiguous for the industry label is automatically matched with the industry label, and the label matching accuracy is high, thereby providing a good foundation for determining the enterprise portrait of the enterprise, being convenient for providing high quality service conforming to the enterprise operation condition for the enterprise, and improving the user experience.
Fig. 3 is a flowchart of a method for determining an industry label according to another embodiment of the present invention, where, based on the embodiment shown in fig. 2, steps S202 and S203 are further refined, and a step of word segmentation processing is added to an operation range of a target enterprise after step S201, as shown in fig. 3, the method for determining an industry label according to the present embodiment includes the following steps:
Step S301, obtaining the operating range of the target enterprise of the stock user.
The type of the industry label of the target enterprise is an unknown label type.
Step S302, word segmentation processing is carried out on the operation scope of the target enterprise so as to obtain each target operation scope word segmentation of the operation scope of the target enterprise.
Specifically, word segmentation refers to a process of recombining continuous sentences into word sequences according to a certain specification. The business scope of the enterprise related by the invention can be described in Chinese or English. The word processing algorithm may be a word segmentation algorithm based on string matching, a word segmentation algorithm based on a hidden markov model (Hidden Markov Model, HMM), a word segmentation algorithm based on a conditional random field, or other word segmentation algorithms.
Further, word segmentation can be performed on the operating range of the target enterprise and the operating range of each enterprise of the subsequent subclass based on the chinese word segmentation component jieba word segmentation (barker word) of Python.
For example, assume that the business scope of the target enterprise is "the business content of the own enterprise is: wholesale and retail of grain and oil, food, beverage and tobacco products, firstly, removing the content before colon in the operation range, removing stop words of ' and ', and ' and removing punctuation marks in the operation range, and then performing word segmentation processing to obtain target operation range words: grain and oil, beverages, tobacco products, wholesale and retail.
Step S303, for each subclass under the target class, acquiring the operating range of each enterprise of the stock users corresponding to the subclass.
Step S304, for each enterprise of each subclass, word segmentation processing is carried out on the operation scope of the enterprise so as to obtain word segmentation of each enterprise operation scope of the enterprise.
Specifically, for each enterprise with definite industry labels, word segmentation is performed on the operation scope of the enterprise, and the specific word segmentation processing algorithm is similar to that in step S302 and is not described herein again, so that the enterprise operation scope word segmentation of each enterprise of each subclass is obtained.
Step S305, for each subclass, performing duplication elimination processing and disabling word removal processing on the enterprise operation scope word of each enterprise of the subclass to obtain subclass operation content of the subclass.
In particular, a stop word set may be predetermined, the stop word set being composed of respective stop words. And then the operation of removing the stop words can be performed on the enterprise operation range word segmentation of each enterprise of the subclass based on the stop word set. The sub-class business content is composed of business operation scope word segmentation of each business after duplication removal and stop word removal processing.
Step S306, for each sub-class operation content, calculating the matching degree of the sub-class operation content and the operation scope of the target enterprise according to the enterprise operation scope segmentation of the sub-class operation content and the target operation scope segmentation of the sub-class operation content.
Specifically, the weight value of the business scope word segmentation can be determined according to the occurrence frequency of the business scope word segmentation in the subclass business content, and then when the target business scope word segmentation is matched with the business scope word segmentation, the weight value of the business scope word segmentation is determined to be the word segmentation score of the target business scope word segmentation, and the word segmentation scores of the target enterprises are overlapped, so that the matching degree of the business scope of the target enterprises corresponding to the subclass business content can be obtained.
Optionally, fig. 4 is a flowchart of step S306 in the embodiment of fig. 3, and as shown in fig. 4, step S306 includes the following steps:
Step 3061, determining the total management content of the target category according to the enterprise management scope word segmentation of each sub-category management content.
Specifically, the total management content corresponding to the target category can be obtained by integrating the word segmentation of each enterprise management scope of each enterprise of each subclass management content. The target category is a category or a major category to which the industry label of the target enterprise belongs.
Step S3062, calculating first scores of business scope segmentation words of each sub-class business content in the total business content by taking the total business content as a document set based on word frequency-inverse document frequency technology aiming at each sub-class business content.
Among them, the Term Frequency-inverse document Frequency (IT-DTF, term Frequency-Inverse Document Frequency) technique is a technique for evaluating the importance of a word to a certain warm document in a document set or corpus. The weight of the word is determined mainly according to the frequency of word occurrence. The first score is the IT-DTF value of the business scope word segmentation in the document lump management content.
Specifically, term Frequency (TF) refers to the number of occurrences of a given word, and is expressed as:
where Tf term represents the word frequency of a given word term; t term represents the number of times a given term appears in a given document or a given article, and N T represents the total number of words in a given document or a given article.
Specifically, the inverse document frequency (Inverse Document Frequency, IDF) is a parameter used to describe the general importance of a given word, which is inversely proportional to the degree of commonality of the word, expressed as:
Where Idf term represents the inverse document frequency for a given term; d term denotes the number of documents containing a given term; n D represents the total number of documents in the corpus.
Further, for each given word, the TF-IDF value, i.e., the first score, is obtained by multiplying the word frequency by the inverse document frequency.
Specifically, the total management content is taken as a document set, and the word frequency and the inverse document frequency of the word segmentation of each enterprise management range in the sub-class management content are calculated based on the word frequency-inverse document frequency technology, so that the TF-IDF value of the word segmentation of each enterprise management range, namely the first score, can be obtained.
Step 3063, determining, for each sub-category operation content, a matching degree between the sub-category operation content and the operation scope of the target enterprise according to each target operation scope word and the first score of each enterprise operation scope word of the sub-category operation content.
Specifically, the matching degree of the sub-class business content and the sub-class business scope of the target enterprise can be obtained by superposing the first scores of the business scope segmentation words of the sub-class business content matched with the target business scope segmentation words.
For example, assuming that the sub-class business content includes Word1, word2, word3 and Word4, the corresponding first scores are 0.48, 0.24, 0.01 and 0.05, respectively, and the target business scope Word includes Word2 and Word3, determining that Word2 and Word3 in the sub-class business content are matched words, and adding the first scores to obtain the matching degree, i.e., the matching degree is 0.24+0.01, i.e., 0.25.
Optionally, determining the matching degree of the sub-class operation content and the operation scope of the target enterprise according to the target operation scope word and the first score of the enterprise operation scope word of the sub-class operation content, including:
For each target business scope word, when the target business scope word is matched with the current business scope word of the subclass business content, determining the first score of the current business scope word as the target score of the target business scope word; and determining the matching degree of the sub-class operation content and the operation scope of the target enterprise according to the target score of each target operation scope word segmentation.
The current enterprise operation scope word segmentation is any one enterprise operation scope word segmentation in the subclass operation content.
Specifically, matching the target business scope segment with the current business scope segment may refer to the same or similar.
Specifically, the sum of the target scores of the word segmentation in each target operation range can be calculated, so that the matching degree of the sub-class operation content and the operation range of the target enterprise is obtained.
Further, for each subclass, the number of enterprises corresponding to the subclass can be obtained, the subclass weight value of each subclass is determined according to the number of enterprises, and then the matching degree of the subclass operation content and the operation scope of the target enterprises is determined according to the subclass weight value and the target score of the word segmentation of each target operation scope.
Specifically, the subclass weight value is determined by the ratio of the number of enterprises corresponding to the subclass to the total number of enterprises corresponding to the target class. By setting the subclass weight value, the influence on the matching degree calculation caused by different enterprise numbers of different subclasses is avoided.
By way of example, assuming that the target business is the manufacturing industry, the stock users include 2 industry-labeled explicit subclasses under one of the manufacturing industry, namely candy, chocolate manufacturing, and dairy manufacturing, respectively, with 7 businesses for the candy chocolate manufacturing subclass and 3 businesses for the dairy manufacturing subclass, then the subclass weight value for the candy, chocolate manufacturing subclass is determined to be 0.3, and the subclass weight value for the dairy manufacturing subclass is determined to be 0.7.
Step S307, determining the subclass business content with the highest matching degree as the matching business content of the target enterprise.
Step S308, determining the industry label of the subclass corresponding to the matched operation content as the industry label of the target enterprise.
In this embodiment, for a target enterprise whose industry label of the stock user is not clear, acquiring an operation range of the target enterprise, and an operation range of each enterprise corresponding to each subclass whose industry label of the stock user belongs to the same class or a large class of the target enterprise, and performing word segmentation processing on each operation range; aiming at each subclass, the sub-class management content is integrated by performing duplication removal and stop word removal processing on the word segmentation of each enterprise of the subclass; based on TF-IDF technology, taking the total management content of the category or the major class as a document set, and calculating the first score of the word segmentation of each minor class; the matching degree of the target enterprise and each subclass is determined through word segmentation matching and the first score, so that subclass operation content with the highest matching degree with the operation range of the target enterprise is obtained, and further the industry label of the subclass is determined as the industry label of the target enterprise, so that the industry label which is clear for the enterprise with the undefined industry label is automatically matched, the label matching accuracy is high, a good foundation is provided for determining the enterprise portrait of the enterprise, high-quality service which accords with the operation condition of the enterprise is conveniently provided for the enterprise, and the user experience is improved.
Fig. 5 is a flowchart of a method for determining an industry label according to another embodiment of the present invention, which is added after step S303 on the basis of the embodiment shown in fig. 3, and as shown in fig. 5, the method for determining an industry label according to the present invention includes the following steps:
step S501, obtaining the operating range of the target enterprise of the stock user.
The type of the industry label of the target enterprise is an unknown label type.
Specifically, let the major class of the industry label of the target enterprise be the class F, the stock user has n minor classes of industries under the major class or the class F, which are F i (i=1, 2,3, …, n), assuming that F 1 is a minor class of industry label ambiguous, the minor class of industry F 1 corresponds to m 1 target enterprisesThen the individual target enterprises/>, of the stock users need to be acquiredIs a business scope of the (c).
Step S502, word segmentation processing is performed on the operation scope of the target enterprise to obtain each target operation scope word segmentation of the operation scope of the target enterprise.
Step S503, for each subclass under the target class, acquiring an operating range of each enterprise of the stock users corresponding to the subclass.
Specifically, the target class, that is, the above-mentioned major class or gate class F, obtains the operating range of each enterprise of each minor class industry F i (i=2, 3, …, n), that is, obtains each enterprise M i represents the number of businesses in the ith subclass or subclass industry.
Step S504, for each enterprise, word segmentation processing is performed on the business scope of the enterprise, so as to obtain word segmentation of each enterprise business scope of the enterprise.
Step S505, for each subclass, performing duplication elimination processing and disabling word removal processing on the enterprise operation scope word of each enterprise of the subclass, so as to obtain subclass operation content of the subclass.
Specifically, for each enterpriseThe business scope of (1) is subjected to word segmentation, duplication removal and stop word removal processing, and then is integrated by taking the industry of the subclass as a group to obtain the subclass business content E i (i=2, 3, …, n) of the subclass.
Step S506, determining the total management content of the target category according to the enterprise management scope word segmentation of the management content of each subclass.
Step S507, for each sub-class business content, calculating a first score of each business operation scope word segmentation of the sub-class business content in the total business content by taking the total business content as a document set based on word frequency-inverse document frequency technology.
Specifically, the total business content E is taken as a "document set", the sub-business content E i (i=2, 3, …, n) of each sub-business industry is taken as an "article", and TF-IDF scores of the words of each enterprise business scope in the sub-business content, namely the first score, are calculated based on TF-IDF technology.
Step S508, for each enterprise of each subclass, calculating word vectors of words of each enterprise operation scope of the enterprise, and determining enterprise operation scope sentence vectors of the enterprise according to the word vectors of words of each enterprise operation scope.
Specifically, for each enterprise of each subclassBased on a preset word vector algorithm, word vectors of words in each enterprise operation range of the enterprise are calculated, and then enterprise operation range sentence vectors of the enterprise are obtained.
Optionally, the calculating the word vector of each business operation scope word of the business includes:
based on the text vectorization model and a preset Chinese word vector dictionary, calculating word vectors of word segmentation in each enterprise operation range of the enterprise.
The Word to Vector (Word 2 vec) model is a tool for dialectically converting words into numerical vectors. The preset Chinese word vector dictionary is a word vector dictionary trained based on a corpus of a large number of Chinese words.
Step S509, for each subclass, determining an operation scope center vector of the subclass according to the enterprise operation scope sentence vector of each enterprise of the subclass.
Specifically, the enterprise operation range sentence vectors of each enterprise of the subclass can be vector summed, so as to obtain the operation range center vector of the subclass.
Step S510, calculating word vectors of the words of each target operation scope of the target enterprise, and determining target operation scope sentence vectors of the target enterprise according to the word vectors of the words of each target operation scope.
Optionally, the calculating the word vector of each target operation scope word of the target enterprise includes:
based on a text vectorization model and a preset Chinese word vector dictionary, calculating word vectors of word segmentation in each target business scope of the target enterprise.
It should be noted that, the specific manner of calculating the word vector and the target business scope sentence vector of each target business scope word of the target enterprise is the same as the manner of calculating the word vector and the enterprise business scope sentence vector in step S508, and only the object is replaced by the enterprise of the subclass.
Step S511, calculating the vector distance between the target operation range sentence vector and the operation range center vector of each subclass.
Specifically, the vector distance is the euclidean distance of two vectors, namely, the euclidean distance of the target business scope sentence vector and the business scope center vector of the subclass.
Step S512, determining a second score of the business scope of the target enterprise corresponding to the sub-class business content of the sub-class according to the target business scope word and the first score of the business scope word of the sub-class business content.
Specifically, for each subclass, the second score corresponding to the subclass is the sum of the first scores of the business scope tokens of the subclass that match the target business scope token of the target business.
Step S513, determining a matching degree between the subclass operation content of the subclass and the operation range of the target enterprise according to the second score and the vector distance.
Specifically, the second score of subclass F i (i=2, 3, …, n) is S i (i=2, 3, …, n), the vector distance is D i (i=2, 3, …, n), and the expression of the matching degree P i corresponding to subclass F i is:
Pi=Si+λDi
wherein, lambda is a weight coefficient, and lambda takes the value of negative number.
Step S514, determining the subclass business content with the highest matching degree as the matching business content of the target enterprise.
Step S515, determining the industry label of the subclass corresponding to the matching operation content as the industry label of the target enterprise.
In this embodiment, for a target enterprise with an undefined industry label, the matching degree of the subclass management content of the subclass and the management range of the target enterprise is determined through multiple dimensions, specifically, the matching degree of the words of the subclass management content of the target enterprise and the target enterprise is calculated from the word segmentation angle through a TF-IDF technology, and the overall matching degree is calculated from the overall angle, namely the sentence vector angle through a text vectorization model, and the matching degree of the subclass management content of the subclass and the management range of the target enterprise is comprehensively determined through the combination of the subclass management content of the subclass and the target enterprise, so that the accuracy of calculating the matching degree is improved; the industry label of the subclass industry with the highest matching degree is determined as the industry label of the target enterprise, so that the industry label which is automatically and definitely matched for the enterprise with the undefined industry label is realized, and the label matching accuracy is high, thereby providing a good basis for determining the enterprise portrait of the enterprise, being convenient for providing high-quality service which accords with the enterprise operation condition for the enterprise, and improving the user experience.
Fig. 6 is a schematic structural diagram of an industry label determining device according to an embodiment of the present invention, where, as shown in fig. 6, the industry label determining device includes: a data acquisition module 610, a sub-category business content determination module 620, a content matching module 630, and an industry label determination module 640.
The data obtaining module 610 is configured to obtain an operation range of a target enterprise of the stock user, where a type of an industry label of the target enterprise is an unknown label type; a subclass operation content determining module 620, configured to obtain, for each subclass under a target class, an operation range of each enterprise of the stock user corresponding to the subclass, and generate, according to the operation range of each enterprise, a subclass operation content of the subclass, where the target class is a class or a major class to which an industry label of the target enterprise belongs, and a type of the industry label of the subclass is a known label type; the content matching module 630 is configured to determine, according to the operation scope of the target enterprise and each of the sub-class operation contents, the sub-class operation content that is matched with the operation scope of the target enterprise as the matched operation content of the target enterprise; and the industry label determining module 640 is configured to determine the industry label of the subclass corresponding to the matching operation content as the industry label of the target enterprise.
Optionally, the subclass business content determination module 620 includes:
An operation range obtaining unit, configured to obtain, for each subclass under a target class, an operation range of each enterprise of the stock user corresponding to the subclass; the first word segmentation processing unit is used for carrying out word segmentation processing on the operation range of each enterprise aiming at each enterprise so as to obtain word segmentation of each enterprise operation range of the enterprise; the sub-class management content determining unit is used for carrying out duplication elimination processing and disabling word removal processing on the enterprise management scope word of each enterprise of the sub-class aiming at each sub-class so as to obtain sub-class management content of the sub-class.
Optionally, the industry label determining device further includes:
and the second word segmentation processing unit is used for carrying out word segmentation processing on the operation range of the target enterprise so as to obtain each target operation range word segmentation of the operation range of the target enterprise.
Accordingly, the content matching module 630 includes:
The matching degree calculating unit is used for calculating the matching degree of the sub-class operation content and the operation range of the target enterprise according to the enterprise operation range segmentation of the sub-class operation content and the target operation range segmentation of the sub-class operation content aiming at each sub-class operation content; and the matching management content determining unit is used for determining the sub-class management content with the highest matching degree as the matching management content of the target enterprise.
Optionally, the matching degree calculating unit includes:
A total management content determining subunit, configured to determine total management content of the target category according to the business operation scope segmentation of each sub-category management content; a first score calculating subunit, configured to calculate, for each sub-category management content, a first score of each business operation scope word of the sub-category management content in the total management content by using the total management content as a document set based on a word frequency-inverse document frequency technology; and the matching degree calculating subunit is used for determining the matching degree of the sub-class management content and the management scope of the target enterprises according to the target management scope word and the first score of each enterprise management scope word of the sub-class management content aiming at each sub-class management content.
Optionally, the matching degree calculating subunit is specifically configured to:
For each target business scope word, when the target business scope word is matched with the current business scope word of the subclass business content, determining the first score of the current business scope word as the target score of the target business scope word; and determining the matching degree of the sub-class operation content and the operation scope of the target enterprise according to the target score of each target operation scope word segmentation.
Optionally, the industry label determining device further includes:
The enterprise operation range sentence vector determining module is used for calculating word vectors of the enterprise operation range words of each enterprise of each subclass and determining the enterprise operation range sentence vector of each enterprise according to the word vectors of the enterprise operation range words; the management scope center vector determining module is used for determining the management scope center vector of each subclass according to the enterprise management scope sentence vector of each enterprise of the subclass; the target business scope sentence vector determining module is used for calculating word vectors of each target business scope word segmentation of the target enterprise and determining the target business scope sentence vector of the target enterprise according to the word vectors of each target business scope word segmentation; and the vector distance calculation module is used for calculating the vector distance between the target operation range sentence vector and the operation range center vector of each subclass.
Correspondingly, the matching degree calculating subunit is specifically configured to:
Determining a second score of the business scope of the target enterprise corresponding to the subclass business content according to the target business scope word and the first score of the enterprise business scope word of the subclass business content; and determining the matching degree of the subclass operation content of the subclass and the operation range of the target enterprise according to the second score and the vector distance.
Optionally, the calculating the word vector of each business operation scope word of the business includes:
Based on a text vectorization model and a preset Chinese word vector dictionary, calculating word vectors of word segmentation in each enterprise operation range of the enterprise; correspondingly, the calculating the word vector of each target operation range word of the target enterprise comprises the following steps: based on a text vectorization model and a preset Chinese word vector dictionary, calculating word vectors of word segmentation in each target business scope of the target enterprise.
The industry label determining device provided by the embodiment of the invention can execute the industry label determining method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the executing method.
Fig. 7 is a schematic structural diagram of an industry label determining apparatus according to an embodiment of the present invention, where, as shown in fig. 7, the industry label determining apparatus includes: memory 710, processor 720, and computer programs.
Wherein the computer program is stored in the memory 710 and configured to be executed by the processor 720 to implement the industry label determining method provided by any of the embodiments corresponding to fig. 2-5 of the present invention.
Wherein the memory 710 and the processor 720 are coupled via a bus 730.
The description may be understood correspondingly with reference to the description and effects corresponding to the steps of fig. 2 to fig. 5, and will not be repeated here.
An embodiment of the present invention provides a computer readable storage medium having a computer program stored thereon, the computer program being executed by a processor to implement the method for determining an industry label provided in any of the embodiments corresponding to fig. 2-5 of the present invention.
The computer readable storage medium may be, among other things, ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple modules may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms.
The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in the embodiments of the present invention may be integrated in one processing unit, or each module may exist alone physically, or two or more modules may be integrated in one unit. The units formed by the modules can be realized in a form of hardware or a form of hardware and software functional units.
The integrated modules, which are implemented in the form of software functional modules, may be stored in a computer readable storage medium. The software functional module is stored in a storage medium, and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (english: processor) to perform some of the steps of the methods according to the embodiments of the invention.
It should be appreciated that the Processor may be a central processing unit (Central Processing Unit, abbreviated as CPU), or may be other general purpose Processor, digital signal Processor (DIGITAL SIGNAL Processor, abbreviated as DSP), application SPECIFIC INTEGRATED Circuit (ASIC), or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in a processor for execution.
The memory may comprise a high-speed RAM memory, and may further comprise a non-volatile memory NVM, such as at least one magnetic disk memory, and may also be a U-disk, a removable hard disk, a read-only memory, a magnetic disk or optical disk, etc.
The bus may be an industry standard architecture (Industry Standard Architecture, ISA) bus, an external device interconnect (PERIPHERAL COMPONENT, PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, among others. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, the buses in the drawings of the present invention are not limited to only one bus or to one type of bus.
The storage medium may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an Application SPECIFIC INTEGRATED Circuits (ASIC). It is also possible that the processor and the storage medium reside as discrete components in an electronic device or a master device. It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present invention.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims (8)

1. A method for determining an industry label, comprising:
acquiring the operating range of a target enterprise of an inventory user, wherein the type of an industry label of the target enterprise is an unknown label type;
Word segmentation processing is carried out on the operation scope of the target enterprise to obtain word segmentation of each target operation scope of the target enterprise;
Aiming at each subclass under a target class, acquiring the operating range of each enterprise of the stock user corresponding to the subclass, and generating subclass operating contents of the subclass according to the operating range of each enterprise, wherein the target class is a category or a major class to which an industry label of the target enterprise belongs, and the type of the industry label of the subclass is a known label type;
determining the total business content of the target category according to the business scope word segmentation of each sub-category business content;
aiming at each sub-class management content, calculating a first score of each enterprise management scope word of the sub-class management content in the total management content by taking the total management content as a document set based on a word frequency-inverse document frequency technology;
For each sub-class operation content, determining the matching degree of the sub-class operation content and the operation scope of the target enterprise according to each target operation scope word and the first score of each enterprise operation scope word of the sub-class operation content;
And determining the sub-class business content with the highest matching degree as the matching business content of the target enterprise, and determining the industry label of the sub-class corresponding to the matching business content as the industry label of the target enterprise.
2. The method of claim 1, wherein generating sub-class business content for the sub-class from the business context for each business comprises:
performing word segmentation processing on the business scope of each enterprise of each subclass to obtain word segmentation of each business scope of each enterprise;
And carrying out duplication elimination processing and stop word elimination processing on enterprise operation range segmentation words of each enterprise of the subclass aiming at each subclass so as to obtain subclass operation contents of the subclass.
3. The method of claim 1, wherein determining a degree of matching of the sub-group business content to the business scope of the target business based on the respective target business scope segmentation and the first score of the respective business scope segmentation of the sub-group business content comprises:
For each target business scope word, when the target business scope word is matched with the current business scope word of the subclass business content, determining the first score of the current business scope word as the target score of the target business scope word;
And determining the matching degree of the sub-class operation content and the operation scope of the target enterprise according to the target score of each target operation scope word segmentation.
4. The method as recited in claim 1, further comprising:
for each enterprise of each subclass, calculating word vectors of words of each enterprise operation range of the enterprise, and determining enterprise operation range sentence vectors of the enterprise according to the word vectors of words of each enterprise operation range;
For each subclass, determining an operation range center vector of the subclass according to the enterprise operation range sentence vector of each enterprise of the subclass;
calculating word vectors of the word segmentation in each target operation range of the target enterprise, and determining target operation range sentence vectors of the target enterprise according to the word vectors of the word segmentation in each target operation range;
calculating the vector distance between the target operation range sentence vector and the operation range center vector of each subclass;
Correspondingly, determining the matching degree of the sub-class operation content and the operation scope of the target enterprise according to the target operation scope word and the first score of the enterprise operation scope word of the sub-class operation content, including:
determining a second score of the business scope of the target enterprise corresponding to the subclass business content according to the target business scope word and the first score of the enterprise business scope word of the subclass business content;
and determining the matching degree of the subclass operation content of the subclass and the operation range of the target enterprise according to the second score and the vector distance.
5. The method of claim 4, wherein said calculating word vectors for individual business-wide word divisions of said business comprises:
based on a text vectorization model and a preset Chinese word vector dictionary, calculating word vectors of word segmentation in each enterprise operation range of the enterprise;
correspondingly, the calculating the word vector of each target operation range word of the target enterprise comprises the following steps:
based on a text vectorization model and a preset Chinese word vector dictionary, calculating word vectors of word segmentation in each target business scope of the target enterprise.
6. An industry label determining apparatus, comprising:
The system comprises a data acquisition module, a storage user and a storage user, wherein the data acquisition module is used for acquiring the operating range of a target enterprise of the storage user, and the type of an industry label of the target enterprise is an unknown label type;
The second word segmentation processing unit is used for carrying out word segmentation processing on the operation range of the target enterprise so as to obtain each target operation range word segmentation of the operation range of the target enterprise;
A subclass operation content determining module, configured to obtain, for each subclass under a target class, an operation range of each enterprise of the stock user corresponding to the subclass, and generate, according to the operation range of each enterprise, a subclass operation content of the subclass, where the target class is a class or a major class to which an industry label of the target enterprise belongs, and a type of the industry label of the subclass is a known label type;
The content matching module is used for determining the total management content of the target category according to the enterprise management scope word segmentation of each subclass management content; aiming at each sub-class management content, calculating a first score of each enterprise management scope word of the sub-class management content in the total management content by taking the total management content as a document set based on a word frequency-inverse document frequency technology; for each sub-class operation content, determining the matching degree of the sub-class operation content and the operation scope of the target enterprise according to each target operation scope word and the first score of each enterprise operation scope word of the sub-class operation content; determining the subclass operation content with the highest matching degree as the matching operation content of the target enterprise;
and the industry label determining module is used for determining the industry label of the subclass corresponding to the matched operation content as the industry label of the target enterprise.
7. An industry label determining apparatus, characterized in that the industry label determining apparatus includes: memory, a processor and an industry label determination program stored on the memory and executable on the processor, which when executed by the processor, implements the steps of the industry label determination method of any one of claims 1 to 5.
8. A computer-readable storage medium, wherein a business label determining program is stored on the computer-readable storage medium, which when executed by a processor, implements the steps of the business label determining method according to any one of claims 1 to 5.
CN202011060599.XA 2020-09-30 2020-09-30 Industry label determining method, device, equipment and storage medium Active CN112163153B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202011060599.XA CN112163153B (en) 2020-09-30 2020-09-30 Industry label determining method, device, equipment and storage medium
PCT/CN2021/103262 WO2022068297A1 (en) 2020-09-30 2021-06-29 Method, apparatus and device for determining industry label, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011060599.XA CN112163153B (en) 2020-09-30 2020-09-30 Industry label determining method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112163153A CN112163153A (en) 2021-01-01
CN112163153B true CN112163153B (en) 2024-05-03

Family

ID=73860835

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011060599.XA Active CN112163153B (en) 2020-09-30 2020-09-30 Industry label determining method, device, equipment and storage medium

Country Status (2)

Country Link
CN (1) CN112163153B (en)
WO (1) WO2022068297A1 (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112163153B (en) * 2020-09-30 2024-05-03 深圳前海微众银行股份有限公司 Industry label determining method, device, equipment and storage medium
CN113869640A (en) * 2021-08-26 2021-12-31 中国环境科学研究院 Enterprise screening method and device, electronic equipment and storage medium
CN113869639B (en) * 2021-08-26 2023-11-07 中国环境科学研究院 Yangtze river basin enterprise screening method and device, electronic equipment and storage medium
CN115018258B (en) * 2022-05-11 2023-08-18 中国城市规划设计研究院深圳分院 Method for identifying enterprise type and industry chain space in target area
CN115905506B (en) * 2023-02-21 2023-05-16 江西省科技事务中心 Basic theory file pushing method, system, computer and readable storage medium
CN116361726B (en) * 2023-04-03 2024-03-29 全拓科技(杭州)股份有限公司 Data processing method based on multidimensional big data analysis
CN116579786B (en) * 2023-05-06 2023-11-14 全拓科技(杭州)股份有限公司 Data cleaning method and system applied to big data analysis

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107169523A (en) * 2017-05-27 2017-09-15 鹏元征信有限公司 Automatically determine method, storage device and the terminal of the affiliated category of employment of mechanism
CN108171276A (en) * 2018-01-17 2018-06-15 百度在线网络技术(北京)有限公司 For generating the method and apparatus of information
CN110020427A (en) * 2019-01-30 2019-07-16 阿里巴巴集团控股有限公司 Strategy determines method and apparatus
KR20190114166A (en) * 2018-03-29 2019-10-10 (주)다음소프트 Industrial classifying system and method using autoencoder
CN110781955A (en) * 2019-10-24 2020-02-11 中国银联股份有限公司 Method and device for classifying label-free objects and detecting nested codes and computer-readable storage medium
CN111597304A (en) * 2020-05-15 2020-08-28 上海财经大学 Secondary matching method for accurately identifying Chinese enterprise name entity

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130268526A1 (en) * 2012-04-06 2013-10-10 Mark E. Johns Discovery engine
CN105808641A (en) * 2016-02-24 2016-07-27 百度在线网络技术(北京)有限公司 Mining method and device of off-line resources
US11093557B2 (en) * 2016-08-29 2021-08-17 Zoominfo Apollo Llc Keyword and business tag extraction
CN110188357B (en) * 2019-05-31 2023-06-20 创新先进技术有限公司 Industry identification method and device for objects
CN111027318B (en) * 2019-10-12 2023-04-07 中国平安财产保险股份有限公司 Industry classification method, device and equipment based on big data and storage medium
CN110990529B (en) * 2019-11-28 2024-04-09 爱信诺征信有限公司 Industry detail dividing method and system for enterprises
CN111538837A (en) * 2020-04-27 2020-08-14 北京同邦卓益科技有限公司 Method and device for analyzing enterprise operation range information
CN112163153B (en) * 2020-09-30 2024-05-03 深圳前海微众银行股份有限公司 Industry label determining method, device, equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107169523A (en) * 2017-05-27 2017-09-15 鹏元征信有限公司 Automatically determine method, storage device and the terminal of the affiliated category of employment of mechanism
CN108171276A (en) * 2018-01-17 2018-06-15 百度在线网络技术(北京)有限公司 For generating the method and apparatus of information
KR20190114166A (en) * 2018-03-29 2019-10-10 (주)다음소프트 Industrial classifying system and method using autoencoder
CN110020427A (en) * 2019-01-30 2019-07-16 阿里巴巴集团控股有限公司 Strategy determines method and apparatus
CN110781955A (en) * 2019-10-24 2020-02-11 中国银联股份有限公司 Method and device for classifying label-free objects and detecting nested codes and computer-readable storage medium
CN111597304A (en) * 2020-05-15 2020-08-28 上海财经大学 Secondary matching method for accurately identifying Chinese enterprise name entity

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
企业经营范围文本自动分类方法探究;韩雪;张业;朱聪慧;;标准科学(第01期);93-96 *

Also Published As

Publication number Publication date
WO2022068297A1 (en) 2022-04-07
CN112163153A (en) 2021-01-01

Similar Documents

Publication Publication Date Title
CN112163153B (en) Industry label determining method, device, equipment and storage medium
CN108334533B (en) Keyword extraction method and device, storage medium and electronic device
CN109508373B (en) Method and device for calculating enterprise public opinion index and computer readable storage medium
CN109819015B (en) Information pushing method, device and equipment based on user portrait and storage medium
CN107688651B (en) News emotion direction judgment method, electronic device and computer readable storage medium
CN113139816A (en) Information processing method, device, electronic equipment and storage medium
CN113821727A (en) Item recommendation method, computer device and computer-readable storage medium
CN113934848A (en) Data classification method and device and electronic equipment
CN112818689B (en) Entity identification method, model training method and device
CN110990701B (en) Book searching method, computing device and computer storage medium
CN112800209A (en) Conversation corpus recommendation method and device, storage medium and electronic equipment
CN109446322B (en) Text analysis method and device, electronic equipment and readable storage medium
CN111859930A (en) Title generation method and device, electronic equipment and storage medium
CN109284384B (en) Text analysis method and device, electronic equipment and readable storage medium
CN108763258B (en) Document theme parameter extraction method, product recommendation method, device and storage medium
CN116089616A (en) Theme text acquisition method, device, equipment and storage medium
US20180005300A1 (en) Information presentation device, information presentation method, and computer program product
EP3089096A1 (en) Category name extraction device, category name extraction method and category name extraction program
CN110162614B (en) Question information extraction method and device, electronic equipment and storage medium
CN113743982A (en) Advertisement putting scheme recommendation method and device, computer equipment and storage medium
CN114255067A (en) Data pricing method and device, electronic equipment and storage medium
CN113191777A (en) Risk identification method and device
CN112784032A (en) Conversation corpus recommendation evaluation method and device, storage medium and electronic equipment
CN110851560B (en) Information retrieval method, device and equipment
CN111191049A (en) Information pushing method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant