CN112163153A - Industry label determination method, device, equipment and storage medium - Google Patents

Industry label determination method, device, equipment and storage medium Download PDF

Info

Publication number
CN112163153A
CN112163153A CN202011060599.XA CN202011060599A CN112163153A CN 112163153 A CN112163153 A CN 112163153A CN 202011060599 A CN202011060599 A CN 202011060599A CN 112163153 A CN112163153 A CN 112163153A
Authority
CN
China
Prior art keywords
enterprise
target
subclass
operation range
operation content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011060599.XA
Other languages
Chinese (zh)
Other versions
CN112163153B (en
Inventor
唐圳
刘博�
郑文琛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Original Assignee
WeBank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co Ltd filed Critical WeBank Co Ltd
Priority to CN202011060599.XA priority Critical patent/CN112163153B/en
Publication of CN112163153A publication Critical patent/CN112163153A/en
Priority to PCT/CN2021/103262 priority patent/WO2022068297A1/en
Application granted granted Critical
Publication of CN112163153B publication Critical patent/CN112163153B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Tourism & Hospitality (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Resources & Organizations (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method, a device, equipment and a storage medium for determining an industry label, wherein the method comprises the following steps: acquiring the operation range of a target enterprise of an inventory user, wherein the type of an industry label of the target enterprise is an unknown label type; aiming at each subclass under a target class, acquiring the operation range of each enterprise of the inventory user corresponding to the subclass, and generating the subclass operation content of the subclass according to the operation range of each enterprise, wherein the target class is a gate class or a large class to which an industry label of the target enterprise belongs, and the type of the industry label of the subclass is a known label type; determining the minor operation content matched with the operation range of the target enterprise as the matched operation content of the target enterprise according to the operation range of the target enterprise and each minor operation content; the industry label of the subclass corresponding to the matched operation content is determined as the industry label of the target enterprise, so that the clear industry label is automatically determined for the enterprise according to the operation range, and the label determination method is high in accuracy.

Description

Industry label determination method, device, equipment and storage medium
Technical Field
The invention relates to the technical field of character recognition, in particular to a method, a device, equipment and a storage medium for determining an industry label.
Background
With the comprehensive development of enterprises, an enterprise has more and more phenomena across multiple industries, and more industry classification labels of the enterprise are industry-undefined labels, that is, the types of the industry labels are the type of labels up to now, such as "other unclassified wholesale industries (5199)", "other agriculture (0190)", and the like, and the labels cannot clearly describe the operation content of the enterprise.
When the industry label of the enterprise is the unknown label type, the enterprise portrait of the enterprise cannot be accurately determined, and thus, a high-quality service cannot be provided for the enterprise portrait.
Disclosure of Invention
The invention mainly aims to provide a method, a device, equipment and a storage medium for determining an industry label, aiming at an enterprise with an indefinite industry label, the industry label is automatically matched with the industry label according to the operation content, and the label determining method has high accuracy, is more suitable for the operation condition of the enterprise and provides a good foundation for subsequent determination of an enterprise portrait.
In order to achieve the above object, in a first aspect, an embodiment of the present invention provides a method for determining an industry tag, where the method for determining an industry tag includes:
acquiring the operation range of a target enterprise of an inventory user, wherein the type of an industry label of the target enterprise is an unknown label type; aiming at each subclass under a target class, acquiring the operation range of each enterprise of the stock user corresponding to the subclass, and generating the subclass operation content of the subclass according to the operation range of each enterprise, wherein the target class is a gate class or a major class to which an industry label of the target enterprise belongs, and the type of the industry label of the subclass is a known label type; determining the minor operation content matched with the operation range of the target enterprise as the matched operation content of the target enterprise according to the operation range of the target enterprise and each minor operation content; and determining the industry label of the subclass corresponding to the matched operation content as the industry label of the target enterprise.
Optionally, generating the subclass management content of the subclass according to the management scope of each enterprise includes:
performing word segmentation processing on the operation range of each enterprise of each subclass to obtain word segmentation of each enterprise operation range of the enterprise; and aiming at each subclass, carrying out duplicate removal processing and stop word removal processing on the enterprise operation range participles of each enterprise of the subclass to obtain the subclass operation content of the subclass.
Optionally, after obtaining the operation range of the target enterprise of the inventory user, the method further includes:
and performing word segmentation processing on the operation range of the target enterprise to obtain each target operation range word segmentation of the operation range of the target enterprise.
Correspondingly, according to the operation range of the target enterprise and each subclass operation content, determining that the subclass operation content matched with the operation range of the target enterprise is the matched operation content of the target enterprise, and the method comprises the following steps:
aiming at each subclass operation content, calculating the matching degree of the subclass operation content and the operation range of the target enterprise according to each enterprise operation range word segmentation and each target operation range word segmentation of the subclass operation content; and determining the subclass operation content with the highest matching degree as the matching operation content of the target enterprise.
Optionally, for each subclass operation content, calculating a matching degree between the subclass operation content and the operation range of the target enterprise according to each enterprise operation range word of the subclass operation content and each target operation range word, including:
determining the total operation content of the target category according to the enterprise operation range word segmentation of each subclass operation content; aiming at each subclass operation content, calculating a first score of each enterprise operation range word of the subclass operation content in the total operation content by taking the total operation content as a document set based on a word frequency-inverse document frequency technology; and aiming at each subclass operation content, determining the matching degree of the subclass operation content and the operation range of the target enterprise according to each target operation range word and the first score of each enterprise operation range word of the subclass operation content.
Optionally, determining a matching degree between the minor business content and the business scope of the target enterprise according to each target business scope segment and the first score of each enterprise business scope segment of the minor business content, including:
for each target operation range participle, when the target operation range participle is matched with the current enterprise operation range participle of the subclass operation content, determining the first score of the current enterprise operation range participle as a target score of the target operation range participle; and determining the matching degree of the subclass operation content and the operation range of the target enterprise according to the target score of each target operation range word.
Optionally, the method further comprises:
calculating word vectors of the words in each enterprise operation range of the enterprises aiming at each enterprise of each subclass, and determining the sentence vectors of the enterprise operation ranges of the enterprises according to the word vectors of the words in each enterprise operation range; aiming at each subclass, determining an operation range center vector of the subclass according to the enterprise operation range sentence vectors of each enterprise of the subclass; calculating word vectors of word segmentation of each target operating range of the target enterprise, and determining target operating range sentence vectors of the target enterprise according to the word vectors of the word segmentation of each target operating range; and calculating the vector distance between the target operation range sentence vector and the operation range center vector of each subclass.
Correspondingly, determining the matching degree of the minor business content and the business scope of the target enterprise according to the target business scope participles and the first scores of the enterprise business scope participles of the minor business content, including:
determining a second score of the operation range of the target enterprise corresponding to the subclass operation content according to the target operation range participles and the first score of the enterprise operation range participles of the subclass operation content; and determining the matching degree of the operation content of the subclass and the operation range of the target enterprise according to the second score and the vector distance.
Optionally, the calculating a word vector of each enterprise operation range word segmentation of the enterprise includes:
and calculating word vectors of word segmentation of each enterprise operation range of the enterprise based on the text vectorization model and a preset Chinese word vector dictionary.
Correspondingly, the calculating the word vector of each target operation range word segmentation of the target enterprise comprises:
and calculating word vectors of word segmentation of each target operation range of the target enterprise based on a text vectorization model and a preset Chinese word vector dictionary.
In a second aspect, an embodiment of the present invention further provides an apparatus for determining an industry tag, including:
the data acquisition module is used for acquiring the operation range of a target enterprise of the stock user, wherein the type of an industry label of the target enterprise is an unknown label type;
the subclass operation content determining module is used for acquiring the operation range of each enterprise of the stock user corresponding to each subclass according to each subclass under a target class, and generating the subclass operation content of the subclass according to the operation range of each enterprise, wherein the target class is a gate class or a large class to which an industry label of the target enterprise belongs, and the type of the industry label of the subclass is a known label type;
the content matching module is used for determining the minor operation content matched with the operation range of the target enterprise as the matched operation content of the target enterprise according to the operation range of the target enterprise and each minor operation content;
and the industry label determining module is used for determining the industry label of the subclass corresponding to the matched operation content as the industry label of the target enterprise.
In a third aspect, an embodiment of the present invention further provides an industry tag determination device, where the industry tag determination device includes: the system comprises a memory, a processor and a determining program of the industry label stored on the memory and capable of running on the processor, wherein the determining program of the industry label realizes the steps of the determining method of the industry label provided by any embodiment corresponding to the first aspect of the invention when being executed by the processor.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, where a program for determining an industry tag is stored on the computer-readable storage medium, and when executed by a processor, the program for determining an industry tag implements the steps of the method for determining an industry tag provided in any embodiment corresponding to the first aspect of the present invention.
The method, the device, the equipment and the storage medium for determining the industry label provided by the embodiment of the invention aim at a target enterprise with an indefinite industry label of a stock user, and determine the minor operation content matched with the operation range of the target enterprise by determining the operation range of each enterprise corresponding to the minor and the operation content of each minor with the known label type of the door class or the label type under the major class of the target enterprise, wherein the minor operation content is determined by the operation range of each enterprise corresponding to the minor, the minor operation content of the minor operation content is determined as the industry label of the target enterprise, the purpose of automatically matching the clear industry label for the enterprise with an indefinite industry label is realized, the label matching accuracy is high, a good foundation is provided for determining the enterprise figure of the enterprise, and the high-quality service according with the enterprise operation condition is provided for the enterprise, the user experience is improved.
Drawings
Fig. 1 is an application scenario diagram of a method for determining an industry tag according to an embodiment of the present invention;
FIG. 2 is a flow chart of a method for determining an industry label provided by an embodiment of the present invention;
FIG. 3 is a flow chart of a method of industry tag determination provided by another embodiment of the present invention;
FIG. 4 is a flowchart of step S306 in the embodiment of FIG. 3 according to the present invention;
FIG. 5 is a flow chart of a method of industry tag determination provided by another embodiment of the present invention;
FIG. 6 is a schematic structural diagram of an industry tag determination apparatus provided by an embodiment of the present invention;
fig. 7 is a schematic structural diagram of an industry tag determination device according to an embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The following explains an application scenario of the embodiment of the present invention:
fig. 1 is an application scenario diagram of the industry label determination method provided by the embodiment of the present invention, and as shown in fig. 1, in order to improve the quality of the electric marketing service, corresponding electric marketing service policies are often formulated according to industry labels of different enterprises. Corresponding industry labels can be determined for each enterprise according to national economy industry classification, wherein codes of the industry labels are door type, large type, middle type and small type from large to small. The service enterprise 110 needs to determine the enterprise representation of the target enterprise 120 according to the industry tag 121 of the target enterprise 120 of the service, so as to provide a high-quality electricity marketing service for the target enterprise 120 according to the enterprise representation.
When the industry tag 121 of the target enterprise 120 is an industry tag of an unknown tag type in a subclass, if the code of the subclass is 5199 for other unlisted wholesale businesses, the granularity of the enterprise portrait of the target enterprise 120 is unclear, so that the enterprise portrait cannot correctly describe the requirement of the target enterprise 120, and a service strategy meeting the requirement cannot be provided for the target enterprise 120.
In order to improve the definition of an enterprise image of an enterprise with an undefined industry label, the embodiment of the invention provides a method for automatically determining a clear industry label for the enterprise with the undefined industry label, and the main concept of the method for determining the industry label is as follows: according to the operation range of an enterprise and the operation range of each enterprise corresponding to each specific subclass with the same major class or door class as the enterprise, the operation range of the subclass matched with the operation range of the target enterprise is determined, and the industry label of the subclass is determined to be the industry label of the target enterprise, so that the target enterprise is matched with a proper specific industry label, further, based on the specific industry label, a clear enterprise portrait of the target enterprise can be generated, and the requirement of the target enterprise is accurately and closely described based on the enterprise portrait, and therefore high-quality service is provided for the target enterprise.
Fig. 2 is a flowchart of a method for determining an industry tag according to an embodiment of the present invention, and as shown in fig. 2, the method for determining an industry tag includes the following steps:
step S201, obtaining the operation range of the target enterprise of the stock user.
And the type of the industry label of the target enterprise is an unknown label type. The industry label generally refers to the category name of the subclass in the national economic industry classification, and the industry label of the unknown label type indicates that the category name of the corresponding subclass contains other industry labels with uncertain category expression, such as "other agriculture", "other animal husbandry", "other unregistered wholesale industry", "other unregistered manufacturing industry", and the like. An inventory user refers to a user that employs the provided services, typically an existing customer. The business scope is data for describing the business scope of the enterprise, and may be described by using keywords or statements.
For example, taking an industry label as an example of other unrecited wholesale industries, the business scope of the target enterprise may be: the business areas are wholesale and retail steel and clothing.
Specifically, the number of target enterprises may be one or more.
Further, after obtaining the business scope of the target enterprise, the method further includes:
clearing the industry label of the target enterprise; converting the operation range into an operation range with a preset format; and performing word segmentation processing on the operation range with the preset format to obtain each target operation range word segmentation corresponding to the operation range of the target enterprise.
Specifically, in order to reconfigure the industry label for the target enterprise with the undefined industry label, the existing industry label needs to be cleaned.
Specifically, the business scope of the target enterprise is usually manually input or filled, so that the business scope is not uniform in format, and the business scope of the target enterprise is converted into the business scope in a preset format for convenience of data processing.
For example, assume that the business scope of the target enterprise C1 is "the company is: wholesale and retail of various stationery, jewelry, beverages and tobacco, and the preset format conversion is carried out on the stationery, the operation range of the converted target enterprise C1 is' operation range: wholesale and retail of various stationery, jewelry, beverages and tobacco ".
Step S202, aiming at each subclass under the target class, obtaining the operation range of each enterprise of the stock user corresponding to the subclass, and generating the subclass operation content of the subclass according to the operation range of each enterprise.
The target category is a gate category or a major category to which the industry label of the target enterprise belongs, and the type of the industry label of the minor category is a known label type. The known tag type is opposite to the location tag type, and the industry tag indicating the enterprise is clear or definite, and may be an industry tag not including the "unlisted" keyword, such as "fruit and vegetable wholesale (5123)", "clothing wholesale (5132)", and the like.
Specifically, the door class or the major class described by the industry label of the target enterprise is obtained, the operation range of each enterprise corresponding to each minor class of the inventory user under the door class or the major class is obtained, that is, the operation range of each enterprise corresponding to each minor class under the target class is obtained, and then the operation range of each enterprise is integrated, so that the minor class operation content of the minor class is obtained.
Further, for the operation range of each enterprise, the content in parentheses in the operation range may be removed, and the operation range of the enterprise whose operation range is an abnormal value may be removed, such as the operation range having a null value.
Furthermore, after obtaining the operation content of each enterprise of the subclass, the keyword of the operation content of each enterprise can be extracted, and the subclass operation content of the subclass is further composed of the keyword of each enterprise.
Illustratively, suppose that the major category to which the target enterprise belongs is "wholesale industry" and the major category code thereof is 51, and there are enterprise customers for 2 known label types of the stock users under the wholesale industry, namely, building material wholesale (the minor category code is 5165) and textile, knitgoods and raw material wholesale (the minor category code is 5131), wherein the minor category to which the building material wholesale belongs is enterprises C2 and C3, the minor category to which the textile, knitgoods and raw material wholesale belongs is enterprises C4, C5 and C6, and further the operational scopes of the enterprises C2 and C3 are integrated to obtain the operational content of the minor category of the building material wholesale, and the operational scopes of the enterprises C4, C5 and C6 are integrated to obtain the operational content of the textile, knitgoods and raw material wholesale minor category.
Step S203, according to the operation range of the target enterprise and each subclass operation content, determining the subclass operation content matched with the operation range of the target enterprise as the matched operation content of the target enterprise.
Specifically, each keyword in the operation range of the target enterprise may be matched with each keyword of the minor operation content, so as to obtain the matching degree of the target enterprise corresponding to the minor, and further determine the minor operation content corresponding to the minor with the highest matching degree as the matching operation content of the target enterprise.
Furthermore, a weight value can be set for each keyword of the subclass operation content in advance, and then when the keyword of the operation range of the target enterprise is consistent with or matched with the keyword of the subclass operation content, the weight value of the matched keyword is obtained, and the weight values of the matched keywords are overlapped to obtain the matching degree corresponding to the subclass.
Specifically, the weight value of the keyword of the sub-category business content may be determined based on the frequency of occurrence of the keyword.
For example, if the keywords of the minor business content of the minor category and the weight values are "wholesale 0.1, retail 0.1, steel 0.4 and wood 0.4", and the keywords of the business content of the target enterprise are "wholesale, steel and clothes", the matching degree of the target enterprise corresponding to the minor category is 0.5.
And step S204, determining the industry label of the subclass corresponding to the matched operation content as the industry label of the target enterprise.
Specifically, after the matching operation content which is most matched with the operation range of the target enterprise is determined from the operation contents of the subclasses of all the subclasses, the clear industry label of the subclass corresponding to the matching operation content is obtained, the industry label of the target enterprise is labeled with the industry label, and the purpose of automatically setting the clear industry label for the target enterprise is achieved.
In this embodiment, for a target enterprise with an undefined industry label of an inventory user, through the operation range of the target enterprise, and the operation contents of each subclass of which the door class corresponding to the target enterprise or the label type under the major class is a known label type, wherein the operation contents of the subclass are determined by the operation range of each enterprise corresponding to the subclass, the operation contents of the subclass matched with the operation range of the target enterprise are determined, and the subclass of the operation contents of the subclass is determined as the industry label of the target enterprise, so that the automatic matching of the industry label with an undefined industry label for the enterprise with an undefined industry label is realized, and the label matching accuracy is high, thereby providing a good foundation for determining the figure of the enterprise, being convenient for providing a high-quality service according with the operation condition of the enterprise for the enterprise, and improving the user experience.
Fig. 3 is a flowchart of a method for determining an industry label according to another embodiment of the present invention, where this embodiment is based on the embodiment shown in fig. 2, further details of step S202 and step S203, and adds a step of performing word segmentation processing on the business scope of a target enterprise after step S201, and as shown in fig. 3, the method for determining an industry label according to this embodiment includes the following steps:
step S301, obtaining the operation range of the target enterprise of the stock user.
And the type of the industry label of the target enterprise is an unknown label type.
Step S302, performing word segmentation processing on the operation range of the target enterprise to obtain each target operation range word segmentation of the operation range of the target enterprise.
Specifically, word segmentation refers to a process of recombining consecutive sentences into a word sequence according to a certain specification. The business range of the enterprise related by the invention can be described by Chinese or English. The word segmentation algorithm may be a word segmentation algorithm based on string matching, a word segmentation algorithm based on Hidden Markov Model (HMM), a word segmentation algorithm based on conditional random fields, or other word segmentation algorithms.
Further, word segmentation processing can be performed on the operation range of the target enterprise and the operation ranges of all subsequent subclasses of enterprises based on the jieba word segmentation (the balance word segmentation) of the Chinese word segmentation component of Python.
For example, assume that the business scope of the target enterprise is "the business content of the company is: wholesale and retail sale of grain and oil, food, beverages and tobacco products ", the content before the code is removed in the operating range, the stop words" and "," and "are removed, punctuation marks in the operating range are removed, and the target operating range word segmentation obtained through word segmentation processing is as follows: grain and oil, beverages, tobacco products, wholesale and retail.
Step S303, aiming at each subclass under the target class, obtaining the operation range of each enterprise of the stock user corresponding to the subclass.
Step S304, performing word segmentation processing on the operation range of each enterprise of each subclass to obtain word segmentation of each enterprise operation range of the enterprise.
Specifically, for each enterprise with a clear industry label, word segmentation processing is performed on the operation range of the enterprise, and a specific word segmentation processing algorithm is similar to that in step S302 and is not repeated here, so that the enterprise operation range word segmentation of each enterprise of each subclass is obtained.
Step S305, aiming at each subclass, carrying out duplicate removal processing and stop word removal processing on the enterprise operation range participles of each enterprise of the subclass to obtain the subclass operation content of the subclass.
Specifically, a stop word set may be predetermined, the stop word set being composed of individual stop words. And then, the stop word operation can be removed from the enterprise operation range participles of each enterprise of the subclass based on the stop word set. And the subclass operation content is composed of enterprise operation range word segmentation of each enterprise after duplicate removal and stop word removal processing.
Step S306, aiming at each subclass operation content, calculating the matching degree of the subclass operation content and the operation range of the target enterprise according to each enterprise operation range participle of the subclass operation content and each target operation range participle.
Specifically, the weight value of the enterprise operation range participle can be determined according to the frequency of the enterprise operation range participle appearing in the subclass operation content, and then when the target operation range participle is matched with the enterprise operation range participle, the weight value of the enterprise operation range participle is determined as the participle score of the target operation range participle, and all the participle scores of the target enterprise are superposed, so that the matching degree of the operation range of the target enterprise corresponding to the subclass operation content can be obtained.
Optionally, fig. 4 is a flowchart of step S306 in the embodiment shown in fig. 3 of the present invention, and as shown in fig. 4, step S306 includes the following steps:
step S3061, determining the total operation content of the target category according to the enterprise operation range word segmentation of each category operation content.
Specifically, the general operation content corresponding to the target category can be obtained by integrating the operation range word segmentation of each enterprise of each subclass operation content. Wherein the target category is a gate category or a large category to which the industry label of the target enterprise belongs.
Step S3062, for each subclass operation content, based on the word frequency-inverse document frequency technique, taking the total operation content as a document set, calculating a first score of each enterprise operation range word of the subclass operation content in the total operation content.
The Term Frequency-Inverse Document Frequency (IT-DTF) technology is a technology for evaluating the importance degree of a Term corresponding to a Document set or a certain warm Document in a corpus. The weight of a word is determined mainly according to the frequency of occurrence of the word. The first score is the IT-DTF value of each enterprise operation range word in the document lump operation content.
Specifically, Term Frequency (TF) refers to the number of times a given Term appears, and is expressed as:
Figure BDA0002712298720000101
wherein, TftermA word frequency representing a given word term; t istermRepresenting the number of times a given term occurs in a given document or a given article, NTRepresenting the total number of words of a given document or a given article.
Specifically, the Inverse Document Frequency (IDF) is a parameter used to describe the general importance of a given word, which is inversely proportional to the degree of commonness of the word, and is expressed as:
Figure BDA0002712298720000102
wherein IdftermAn inverse document frequency representing a given term; dtermRepresenting the number of documents containing a given term; n is a radical ofDRepresenting the total number of documents in the corpus.
Further, for each given word, multiplying its word frequency by the inverse document frequency yields its TF-IDF value, i.e., the first score described above.
Specifically, the total business content is used as a document set, and the word frequency and the inverse document frequency of each enterprise business range word in the subclass business content are calculated based on the word frequency-inverse document frequency technology, so that the TF-IDF value, namely the first score, of each enterprise business range word can be obtained.
Step S3063, for each subclass operation content, determining a matching degree between the subclass operation content and the operation range of the target enterprise according to each target operation range participle and the first score of each enterprise operation range participle of the subclass operation content.
Specifically, the first scores of the enterprise operation range participles of the subclass operation contents matched with the target operation range participles are superposed, so that the matching degree of the subclass operation contents and the subclass operation range of the target enterprise can be obtained.
For example, assuming that the minor business content includes Word1, Word2, Word3 and Word4, the corresponding first scores are 0.48, 0.24, 0.01 and 0.05 respectively, and the target business scope participle includes Word2 and Word3, determining that Word2 and Word3 in the minor business content are matched words, and adding the first scores to obtain the matching degree, i.e., the matching degree is 0.24+0.01, i.e., 0.25.
Optionally, determining a matching degree between the minor business content and the business scope of the target enterprise according to each target business scope segment and the first score of each enterprise business scope segment of the minor business content, including:
for each target operation range participle, when the target operation range participle is matched with the current enterprise operation range participle of the subclass operation content, determining the first score of the current enterprise operation range participle as a target score of the target operation range participle; and determining the matching degree of the subclass operation content and the operation range of the target enterprise according to the target score of each target operation range word.
The current enterprise operation range word is any one enterprise operation range word in the subclass operation contents.
Specifically, the matching of the target operation range word and the current enterprise operation range word may mean that the two are the same or similar.
Specifically, the sum of the target scores of the participles in each target business range can be calculated, so that the matching degree of the subclass business content and the business range of the target enterprise is obtained.
Furthermore, aiming at each subclass, the number of enterprises corresponding to the subclass can be obtained, the subclass weight value of each subclass is determined according to the number of the enterprises, and the matching degree of the subclass operation content and the operation range of the target enterprise is determined according to the subclass weight value and the target score of each target operation range word segmentation.
Specifically, the subclass weight value is determined by the ratio of the number of enterprises corresponding to the subclass to the total number of enterprises corresponding to the target class. By setting the subclass weight value, the influence on the matching degree calculation caused by different numbers of enterprises of different subclasses is avoided.
For example, assuming that the category of the target enterprise is manufacturing, the inventory user includes 2 industry labeled categories under one category of the manufacturing, which are respectively candy, chocolate manufacturing and dairy product manufacturing, and the candy chocolate manufacturing category corresponds to 7 enterprises, and the dairy product manufacturing category corresponds to 3 enterprises, it is determined that the weight value of the category of the candy and chocolate manufacturing category is 0.3, and the weight value of the category of the dairy product manufacturing category is 0.7.
And step S307, determining the subclass operation content with the highest matching degree as the matching operation content of the target enterprise.
And step S308, determining the industry label of the subclass corresponding to the matched operation content as the industry label of the target enterprise.
In this embodiment, for a target enterprise whose industry label of a stock user is undefined, an operating range of the target enterprise is obtained, and an operating range of each enterprise corresponding to each subclass whose industry label of the stock user is defined and which belongs to the same class or major class as the target enterprise, and word segmentation processing is performed on each operating range; aiming at each subclass, performing duplicate removal and stop word removal processing on the word segmentation of each enterprise of the subclass to integrate the operation content of the subclass; based on the TF-IDF technology, taking the general operation content of the gate class or the main class as a document set, and calculating the first scores of the participles of each subclass; the matching degree of the target enterprise and each subclass is determined through word segmentation matching and the first score, so that the subclass operation content with the highest matching degree with the operation range of the target enterprise is obtained, the industry label of the subclass is determined to be the industry label of the target enterprise, the automatic and definite industry label matching for the enterprise with an indefinite industry label is realized, the label matching accuracy is high, a good foundation is provided for determining the enterprise portrait of the enterprise, high-quality service according with the enterprise operation condition is provided for the enterprise, and the user experience is improved.
Fig. 5 is a flowchart of a method for determining an industry tag according to another embodiment of the present invention, where the embodiment is added after step S303 on the basis of the embodiment shown in fig. 3, and as shown in fig. 5, the method for determining an industry tag according to the embodiment includes the following steps:
step S501, obtaining the operation range of the target enterprise of the stock user.
And the type of the industry label of the target enterprise is an unknown label type.
Specifically, the major category of the industry label of the target enterprise is F, and the inventory user has n minor categories of industries under the major category or the door category F, which are respectively Fi(i ═ 1,2,3, …, n), assuming F1For a subclass of industries with undefined trade labels, this subclass of industries F1Corresponds to m1A target enterprise
Figure BDA0002712298720000121
Then the target enterprises that need to acquire inventory users
Figure BDA0002712298720000122
The business scope of (1).
Step S502, performing word segmentation processing on the operation range of the target enterprise to obtain each target operation range word segmentation of the operation range of the target enterprise.
Step S503, aiming at each subclass under the target class, obtaining the operation range of each enterprise of the stock user corresponding to the subclass.
Specifically, the target category, i.e., the major category or the gate category F, acquires the minor category industries Fi(i 2,3, …, n), namely, obtaining the business scope of each enterprise
Figure BDA0002712298720000131
Figure BDA0002712298720000132
Operating range of (m)iRepresenting the number of businesses in the ith subclass or subclass industry.
Step S504, aiming at each enterprise, performing word segmentation processing on the operation range of the enterprise to obtain word segmentation of each enterprise operation range of the enterprise.
And step S505, aiming at each subclass, carrying out duplicate removal processing and stop word removal processing on the enterprise operation range participles of each enterprise of the subclass to obtain the subclass operation content of the subclass.
Specifically, for each enterprise
Figure BDA0002712298720000133
The operation range of the operation system is subjected to word segmentation, duplication removal and stop word removal, and then, the operation range is integrated by taking the subclass industry as a group to obtain subclass operation content E of the subclassi(i=2,3,…,n)。
Step S506, determining the total operation content of the target category according to the enterprise operation range word segmentation of each subclass operation content.
Step S507, aiming at each subclass operation content, calculating a first score of each enterprise operation range word of the subclass operation content in the total operation content by taking the total operation content as a document set based on a word frequency-inverse document frequency technology.
Specifically, the total operation content E is taken as a 'document set', and the subclass operation content E of each subclass industry is takeni(i ═ 2,3, …, n) is "article", and based on the TF-IDF technique, the TF-IDF score of each business segment in the minor business content, i.e., the first score described above, is calculated.
Step S508, aiming at each enterprise of each subclass, word vectors of word segmentation of each enterprise operation range of the enterprise are calculated, and the sentence vectors of the enterprise operation range of the enterprise are determined according to the word vectors of word segmentation of each enterprise operation range.
In particular, for each enterprise of each subclass
Figure BDA0002712298720000134
And calculating word vectors of word segmentation of each enterprise operation range of the enterprise based on a preset word vector algorithm, and further obtaining enterprise operation range sentence vectors of the enterprise.
Optionally, the calculating a word vector of each enterprise operation range word segmentation of the enterprise includes:
and calculating word vectors of word segmentation of each enterprise operation range of the enterprise based on the text vectorization model and a preset Chinese word vector dictionary.
Among them, the text vectorization (Word to Vector, Word2vec) model is a tool that dialects words into numerical vectors. The predetermined chinese word vector dictionary is a word vector dictionary trained based on a large corpus of chinese words.
Step S509, for each subclass, determining a business range center vector of the subclass according to the business range sentence vectors of each enterprise of the subclass.
Specifically, vectors of enterprise operation range sentence vectors of each enterprise of the subclass can be summed, so that an operation range center vector of the subclass is obtained.
Step S510, calculating word vectors of the target business range segments of the target enterprise, and determining target business range sentence vectors of the target enterprise according to the word vectors of the target business range segments.
Optionally, the calculating a word vector of each target business scope word segmentation of the target enterprise includes:
and calculating word vectors of word segmentation of each target operation range of the target enterprise based on a text vectorization model and a preset Chinese word vector dictionary.
It should be noted that the specific manner of calculating the word vectors of the target business range segments and the target business range sentence vectors of the target enterprises is the same as the manner of calculating the word vectors and the enterprise business range sentence vectors in step S508, and only the target enterprises are replaced by the minor enterprises.
And step S511, calculating the vector distance between the target operation range sentence vector and the operation range center vector of each subclass.
Specifically, the vector distance is an euclidean distance between two vectors, that is, the euclidean distance between the target operation range sentence vector and the operation range center vector of the subclass.
Step S512, according to each target operation range word and the first score of each enterprise operation range word of the subclass operation content, determining a second score of the operation range of the target enterprise corresponding to the subclass operation content of the subclass.
Specifically, for each subclass, the second score corresponding to the subclass is the sum of the first scores of the enterprise business range participles of the subclass matched with the target business range participle of the target enterprise.
Step S513, determining the matching degree of the operation content of the subclass and the operation range of the target enterprise according to the second score and the vector distance.
Specifically, subclass Fi(i-2, 3, …, n) is Si(i=2,3,…,n),Vector distance of Di(i-2, 3, …, n), subclass FiCorresponding degree of matching PiThe expression of (a) is:
Pi=Si+λDi
wherein, λ is a weight coefficient, and the value of λ is a negative number.
And step S514, determining the subclass operation content with the highest matching degree as the matching operation content of the target enterprise.
And step S515, determining the industry label of the subclass corresponding to the matching operation content as the industry label of the target enterprise.
In the embodiment, for a target enterprise with an undefined industry label, the matching degree of the subclass operation content of the subclass and the operation range of the target enterprise is determined through multiple dimensions, specifically, the matching degree of the terms of the subclass and the operation range of the target enterprise is calculated from the aspect of word segmentation through a TF-IDF technology, the overall matching degree is calculated from the overall angle, namely the angle of sentence vectors, through a text vectorization model, the matching degree of the subclass operation content of the subclass and the operation range of the target enterprise is comprehensively determined through the combination of the two matching degrees, and the accuracy of the calculation of the matching degree is improved; the industry label of the subclass industry with the highest matching degree is determined as the industry label of the target enterprise, the purpose that the industry label is automatically matched with the clear industry label of the enterprise with the undefined industry label is achieved, the label matching accuracy is high, a good foundation is provided for determining the enterprise portrait of the enterprise, high-quality service according with the enterprise operation condition is provided for the enterprise conveniently, and the user experience is improved.
Fig. 6 is a schematic structural diagram of an industry tag determination apparatus according to an embodiment of the present invention, and as shown in fig. 6, the industry tag determination apparatus includes: a data acquisition module 610, a sub-category operation content determination module 620, a content matching module 630 and an industry tag determination module 640.
The data obtaining module 610 is configured to obtain an operation range of a target enterprise of an inventory user, where a type of an industry tag of the target enterprise is an unknown tag type; a subclass operation content determining module 620, configured to obtain, for each subclass under a target class, an operation range of each enterprise of the stock user corresponding to the subclass, and generate, according to the operation range of each enterprise, subclass operation content of the subclass, where the target class is a gate class or a major class to which an industry tag of the target enterprise belongs, and a type of the industry tag of the subclass is a known tag type; a content matching module 630, configured to determine, according to the operation range of the target enterprise and each of the minor operation contents, that the minor operation content matched with the operation range of the target enterprise is the matched operation content of the target enterprise; and an industry label determining module 640, configured to determine the industry label of the subclass corresponding to the matching operation content as the industry label of the target enterprise.
Optionally, the subclass operation content determining module 620 includes:
the management range acquisition unit is used for acquiring the management ranges of all enterprises of the stock users corresponding to the subclasses aiming at each subclass under the target class; the system comprises a first word segmentation processing unit, a word segmentation processing unit and a word segmentation processing unit, wherein the first word segmentation processing unit is used for carrying out word segmentation processing on the operation range of each enterprise so as to obtain the word segmentation of each enterprise operation range of the enterprise; and the subclass operation content determining unit is used for carrying out duplicate removal processing and stop word removal processing on the enterprise operation range word segmentation of each enterprise of each subclass aiming at each subclass so as to obtain the subclass operation content of the subclass.
Optionally, the apparatus for determining an industry tag further includes:
and the second word segmentation processing unit is used for carrying out word segmentation processing on the operation range of the target enterprise to obtain each target operation range word segmentation of the operation range of the target enterprise.
Accordingly, the content matching module 630 includes:
the matching degree calculation unit is used for calculating the matching degree of the subclass operation contents and the operation range of the target enterprise according to the enterprise operation range participles of the subclass operation contents and the target operation range participles of the subclass operation contents aiming at each subclass operation content; and the matching operation content determining unit is used for determining the subclass operation content with the highest matching degree as the matching operation content of the target enterprise.
Optionally, the matching degree calculating unit includes:
the total operation content determining subunit is used for determining the total operation content of the target category according to the enterprise operation range word segmentation of each subclass operation content; the first score calculating subunit is used for calculating a first score of each enterprise operation range word of each subclass operation content in the total operation content based on a word frequency-inverse document frequency technology and by taking the total operation content as a document set aiming at each subclass operation content; and the matching degree calculation operator unit is used for determining the matching degree of the minor operation contents and the operation range of the target enterprise according to the target operation range participles and the first scores of the enterprise operation range participles of the minor operation contents aiming at each minor operation content.
Optionally, the matching degree operator unit is specifically configured to:
for each target operation range participle, when the target operation range participle is matched with the current enterprise operation range participle of the subclass operation content, determining the first score of the current enterprise operation range participle as a target score of the target operation range participle; and determining the matching degree of the subclass operation content and the operation range of the target enterprise according to the target score of each target operation range word.
Optionally, the apparatus for determining an industry tag further includes:
the system comprises an enterprise operation range sentence vector determining module, a word vector calculating module and a word vector calculating module, wherein the enterprise operation range sentence vector determining module is used for calculating word vectors of all enterprise operation range participles of each enterprise aiming at each enterprise of each subclass and determining the enterprise operation range sentence vectors of the enterprise according to the word vectors of all enterprise operation range participles; the operation range central vector determining module is used for determining the operation range central vector of each subclass according to the enterprise operation range sentence vector of each enterprise of the subclass aiming at each subclass; the target operation range sentence vector determination module is used for calculating word vectors of word segmentation of each target operation range of the target enterprise and determining the target operation range sentence vector of the target enterprise according to the word vectors of the word segmentation of each target operation range; and the vector distance calculation module is used for calculating the vector distance between the target operation range sentence vector and the operation range center vector of each subclass.
Correspondingly, the matching degree operator unit is specifically configured to:
determining a second score of the operation range of the target enterprise corresponding to the subclass operation content according to the target operation range participles and the first score of the enterprise operation range participles of the subclass operation content; and determining the matching degree of the operation content of the subclass and the operation range of the target enterprise according to the second score and the vector distance.
Optionally, the calculating a word vector of each enterprise operation range word segmentation of the enterprise includes:
calculating word vectors of word segmentation of each enterprise operation range of the enterprise based on a text vectorization model and a preset Chinese word vector dictionary; correspondingly, the calculating the word vector of each target operation range word segmentation of the target enterprise comprises: and calculating word vectors of word segmentation of each target operation range of the target enterprise based on a text vectorization model and a preset Chinese word vector dictionary.
The device for determining the industrial label, provided by the embodiment of the invention, can execute the method for determining the industrial label, provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Fig. 7 is a schematic structural diagram of an industry tag determination device according to an embodiment of the present invention, and as shown in fig. 7, the industry tag determination device includes: memory 710, processor 720, and computer programs.
Wherein the computer program is stored in the memory 710 and configured to be executed by the processor 720 to implement the industry label determination method provided by any of the embodiments corresponding to fig. 2-5 of the present invention.
Wherein the memory 710 and the processor 720 are connected by a bus 730.
The relevant description may be understood by referring to the relevant description and effect corresponding to the steps in fig. 2 to fig. 5, and redundant description is not repeated here.
One embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the industry label determination method provided in any embodiment of the present invention corresponding to fig. 2 to fig. 5.
The computer readable storage medium may be, among others, ROM, Random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the modules is only one logical division, and other divisions may be realized in practice, for example, a plurality of modules may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each module may exist alone physically, or two or more modules are integrated into one unit. The unit formed by the modules can be realized in a hardware form, and can also be realized in a form of hardware and a software functional unit.
The integrated module implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present invention.
It should be understood that the Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.
The memory may comprise a high-speed RAM memory, and may further comprise a non-volatile storage NVM, such as at least one disk memory, and may also be a usb disk, a removable hard disk, a read-only memory, a magnetic or optical disk, etc.
The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (enhanced Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, the buses in the figures of the present invention are not limited to only one bus or one type of bus.
The storage medium may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an Application Specific Integrated Circuits (ASIC). Of course, the processor and the storage medium may reside as discrete components in an electronic device or host device. It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A method for determining an industry label, comprising:
acquiring the operation range of a target enterprise of an inventory user, wherein the type of an industry label of the target enterprise is an unknown label type;
aiming at each subclass under a target class, acquiring the operation range of each enterprise of the stock user corresponding to the subclass, and generating the subclass operation content of the subclass according to the operation range of each enterprise, wherein the target class is a gate class or a major class to which an industry label of the target enterprise belongs, and the type of the industry label of the subclass is a known label type;
determining the minor operation content matched with the operation range of the target enterprise as the matched operation content of the target enterprise according to the operation range of the target enterprise and each minor operation content;
and determining the industry label of the subclass corresponding to the matched operation content as the industry label of the target enterprise.
2. The method of claim 1, wherein generating the subclass of business operations for the subclass according to the business scope of each enterprise comprises:
performing word segmentation processing on the operation range of each enterprise of each subclass to obtain word segmentation of each enterprise operation range of the enterprise;
and aiming at each subclass, carrying out duplicate removal processing and stop word removal processing on the enterprise operation range participles of each enterprise of the subclass to obtain the subclass operation content of the subclass.
3. The method of claim 2, after obtaining the business scope of the target business of the inventory user, further comprising:
performing word segmentation processing on the operation range of the target enterprise to obtain each target operation range word segmentation of the operation range of the target enterprise;
correspondingly, according to the operation range of the target enterprise and each subclass operation content, determining that the subclass operation content matched with the operation range of the target enterprise is the matched operation content of the target enterprise, and the method comprises the following steps:
aiming at each subclass operation content, calculating the matching degree of the subclass operation content and the operation range of the target enterprise according to each enterprise operation range word segmentation and each target operation range word segmentation of the subclass operation content;
and determining the subclass operation content with the highest matching degree as the matching operation content of the target enterprise.
4. The method of claim 3, wherein calculating, for each of the minor business contents, a matching degree of the minor business contents with the business scope of the target enterprise according to the business scope segmentation of the minor business contents and the target business scope segmentation comprises:
determining the total operation content of the target category according to the enterprise operation range word segmentation of each subclass operation content;
aiming at each subclass operation content, calculating a first score of each enterprise operation range word of the subclass operation content in the total operation content by taking the total operation content as a document set based on a word frequency-inverse document frequency technology;
and aiming at each subclass operation content, determining the matching degree of the subclass operation content and the operation range of the target enterprise according to each target operation range word and the first score of each enterprise operation range word of the subclass operation content.
5. The method of claim 4, wherein determining the matching degree of the minor business context with the business scope of the target enterprise according to the target business scope participles and the first scores of the enterprise business scope participles of the minor business context comprises:
for each target operation range participle, when the target operation range participle is matched with the current enterprise operation range participle of the subclass operation content, determining the first score of the current enterprise operation range participle as a target score of the target operation range participle;
and determining the matching degree of the subclass operation content and the operation range of the target enterprise according to the target score of each target operation range word.
6. The method of claim 4, further comprising:
calculating word vectors of the words in each enterprise operation range of the enterprises aiming at each enterprise of each subclass, and determining the sentence vectors of the enterprise operation ranges of the enterprises according to the word vectors of the words in each enterprise operation range;
aiming at each subclass, determining an operation range center vector of the subclass according to the enterprise operation range sentence vectors of each enterprise of the subclass;
calculating word vectors of word segmentation of each target operating range of the target enterprise, and determining target operating range sentence vectors of the target enterprise according to the word vectors of the word segmentation of each target operating range;
calculating the vector distance between the target operation range sentence vector and the operation range center vector of each subclass;
correspondingly, determining the matching degree of the minor business content and the business scope of the target enterprise according to the target business scope participles and the first scores of the enterprise business scope participles of the minor business content, including:
determining a second score of the operation range of the target enterprise corresponding to the subclass operation content according to the target operation range participles and the first score of the enterprise operation range participles of the subclass operation content;
and determining the matching degree of the operation content of the subclass and the operation range of the target enterprise according to the second score and the vector distance.
7. The method of claim 6, wherein calculating a word vector for each business segment of the business comprises:
calculating word vectors of word segmentation of each enterprise operation range of the enterprise based on a text vectorization model and a preset Chinese word vector dictionary;
correspondingly, the calculating the word vector of each target operation range word segmentation of the target enterprise comprises:
and calculating word vectors of word segmentation of each target operation range of the target enterprise based on a text vectorization model and a preset Chinese word vector dictionary.
8. An apparatus for industry tag determination, comprising:
the data acquisition module is used for acquiring the operation range of a target enterprise of the stock user, wherein the type of an industry label of the target enterprise is an unknown label type;
the subclass operation content determining module is used for acquiring the operation range of each enterprise of the stock user corresponding to each subclass according to each subclass under a target class, and generating the subclass operation content of the subclass according to the operation range of each enterprise, wherein the target class is a gate class or a large class to which an industry label of the target enterprise belongs, and the type of the industry label of the subclass is a known label type;
the content matching module is used for determining the minor operation content matched with the operation range of the target enterprise as the matched operation content of the target enterprise according to the operation range of the target enterprise and each minor operation content;
and the industry label determining module is used for determining the industry label of the subclass corresponding to the matched operation content as the industry label of the target enterprise.
9. An industry tag determination device, the industry tag determination device comprising: memory, processor and industry tag determination program stored on the memory and executable on the processor, the industry tag determination program, when executed by the processor, implementing the steps of the industry tag determination method according to any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon an industry tag determination program, which when executed by a processor implements the steps of the industry tag determination method according to any one of claims 1 to 7.
CN202011060599.XA 2020-09-30 2020-09-30 Industry label determining method, device, equipment and storage medium Active CN112163153B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202011060599.XA CN112163153B (en) 2020-09-30 2020-09-30 Industry label determining method, device, equipment and storage medium
PCT/CN2021/103262 WO2022068297A1 (en) 2020-09-30 2021-06-29 Method, apparatus and device for determining industry label, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011060599.XA CN112163153B (en) 2020-09-30 2020-09-30 Industry label determining method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112163153A true CN112163153A (en) 2021-01-01
CN112163153B CN112163153B (en) 2024-05-03

Family

ID=73860835

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011060599.XA Active CN112163153B (en) 2020-09-30 2020-09-30 Industry label determining method, device, equipment and storage medium

Country Status (2)

Country Link
CN (1) CN112163153B (en)
WO (1) WO2022068297A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113869639A (en) * 2021-08-26 2021-12-31 中国环境科学研究院 Yangtze river basin enterprise screening method and device, electronic equipment and storage medium
WO2022068297A1 (en) * 2020-09-30 2022-04-07 深圳前海微众银行股份有限公司 Method, apparatus and device for determining industry label, and storage medium
WO2023025330A1 (en) * 2021-08-26 2023-03-02 中国环境科学研究院 Enterprise screening method and apparatus, electronic device, and storage medium

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115018258B (en) * 2022-05-11 2023-08-18 中国城市规划设计研究院深圳分院 Method for identifying enterprise type and industry chain space in target area
CN115905506B (en) * 2023-02-21 2023-05-16 江西省科技事务中心 Basic theory file pushing method, system, computer and readable storage medium
CN116361726B (en) * 2023-04-03 2024-03-29 全拓科技(杭州)股份有限公司 Data processing method based on multidimensional big data analysis
CN116579786B (en) * 2023-05-06 2023-11-14 全拓科技(杭州)股份有限公司 Data cleaning method and system applied to big data analysis

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107169523A (en) * 2017-05-27 2017-09-15 鹏元征信有限公司 Automatically determine method, storage device and the terminal of the affiliated category of employment of mechanism
US20180060437A1 (en) * 2016-08-29 2018-03-01 EverString Innovation Technology Keyword and business tag extraction
CN108171276A (en) * 2018-01-17 2018-06-15 百度在线网络技术(北京)有限公司 For generating the method and apparatus of information
CN110020427A (en) * 2019-01-30 2019-07-16 阿里巴巴集团控股有限公司 Strategy determines method and apparatus
KR20190114166A (en) * 2018-03-29 2019-10-10 (주)다음소프트 Industrial classifying system and method using autoencoder
CN110781955A (en) * 2019-10-24 2020-02-11 中国银联股份有限公司 Method and device for classifying label-free objects and detecting nested codes and computer-readable storage medium
US20200242113A1 (en) * 2016-02-24 2020-07-30 Baidu Online Network Technology (Beijing) Co., Ltd. Method and apparatus for mining offline resources
CN111597304A (en) * 2020-05-15 2020-08-28 上海财经大学 Secondary matching method for accurately identifying Chinese enterprise name entity

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130268526A1 (en) * 2012-04-06 2013-10-10 Mark E. Johns Discovery engine
CN110188357B (en) * 2019-05-31 2023-06-20 创新先进技术有限公司 Industry identification method and device for objects
CN111027318B (en) * 2019-10-12 2023-04-07 中国平安财产保险股份有限公司 Industry classification method, device and equipment based on big data and storage medium
CN110990529B (en) * 2019-11-28 2024-04-09 爱信诺征信有限公司 Industry detail dividing method and system for enterprises
CN111538837A (en) * 2020-04-27 2020-08-14 北京同邦卓益科技有限公司 Method and device for analyzing enterprise operation range information
CN112163153B (en) * 2020-09-30 2024-05-03 深圳前海微众银行股份有限公司 Industry label determining method, device, equipment and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200242113A1 (en) * 2016-02-24 2020-07-30 Baidu Online Network Technology (Beijing) Co., Ltd. Method and apparatus for mining offline resources
US20180060437A1 (en) * 2016-08-29 2018-03-01 EverString Innovation Technology Keyword and business tag extraction
CN107169523A (en) * 2017-05-27 2017-09-15 鹏元征信有限公司 Automatically determine method, storage device and the terminal of the affiliated category of employment of mechanism
CN108171276A (en) * 2018-01-17 2018-06-15 百度在线网络技术(北京)有限公司 For generating the method and apparatus of information
KR20190114166A (en) * 2018-03-29 2019-10-10 (주)다음소프트 Industrial classifying system and method using autoencoder
CN110020427A (en) * 2019-01-30 2019-07-16 阿里巴巴集团控股有限公司 Strategy determines method and apparatus
CN110781955A (en) * 2019-10-24 2020-02-11 中国银联股份有限公司 Method and device for classifying label-free objects and detecting nested codes and computer-readable storage medium
CN111597304A (en) * 2020-05-15 2020-08-28 上海财经大学 Secondary matching method for accurately identifying Chinese enterprise name entity

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
韩雪;张业;朱聪慧;: "企业经营范围文本自动分类方法探究", 标准科学, no. 01, pages 93 - 96 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022068297A1 (en) * 2020-09-30 2022-04-07 深圳前海微众银行股份有限公司 Method, apparatus and device for determining industry label, and storage medium
CN113869639A (en) * 2021-08-26 2021-12-31 中国环境科学研究院 Yangtze river basin enterprise screening method and device, electronic equipment and storage medium
WO2023025330A1 (en) * 2021-08-26 2023-03-02 中国环境科学研究院 Enterprise screening method and apparatus, electronic device, and storage medium
CN113869639B (en) * 2021-08-26 2023-11-07 中国环境科学研究院 Yangtze river basin enterprise screening method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
WO2022068297A1 (en) 2022-04-07
CN112163153B (en) 2024-05-03

Similar Documents

Publication Publication Date Title
CN112163153A (en) Industry label determination method, device, equipment and storage medium
US10474752B2 (en) System and method for slang sentiment classification for opinion mining
CN109508373B (en) Method and device for calculating enterprise public opinion index and computer readable storage medium
CN104834651B (en) Method and device for providing high-frequency question answers
CN109819015B (en) Information pushing method, device and equipment based on user portrait and storage medium
CN110674620A (en) Target file generation method, device, medium and electronic equipment
CN112580637B (en) Text information identification method, text information extraction method, text information identification device, text information extraction device and text information extraction system
CN107247728B (en) Text processing method and device and computer storage medium
CN111666757A (en) Commodity comment emotional tendency analysis method, device and equipment and readable storage medium
TW201619885A (en) E-commerce reputation analysis system, method and computer readable storage medium thereof
CN109710742B (en) Method, system and equipment for processing individual stock announcement natural language query
CN113934848A (en) Data classification method and device and electronic equipment
CN109284384B (en) Text analysis method and device, electronic equipment and readable storage medium
CN109446322B (en) Text analysis method and device, electronic equipment and readable storage medium
CN109213873B (en) Patent matching method and matching system for automatically matching potential buyers for patents to be sold
CN111274384B (en) Text labeling method, equipment and computer storage medium thereof
CN115659961B (en) Method, apparatus and computer storage medium for extracting text views
US10621208B2 (en) Category name extraction device, category name extraction method, and category name extraction program
CN116719997A (en) Policy information pushing method and device and electronic equipment
CN114049165B (en) Commodity price comparison method, device, equipment and medium for purchasing system
CN115712715A (en) Question answering method, device, electronic equipment and storage medium for introduction
Anuradha et al. Fuzzy based summarization of product reviews for better analysis
CN114911936A (en) Model training and comment recognition method and device, electronic equipment and medium
CN110162614B (en) Question information extraction method and device, electronic equipment and storage medium
CN114255067A (en) Data pricing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant