CN113313344A - Label system construction method and system fusing multiple modes - Google Patents

Label system construction method and system fusing multiple modes Download PDF

Info

Publication number
CN113313344A
CN113313344A CN202110394477.2A CN202110394477A CN113313344A CN 113313344 A CN113313344 A CN 113313344A CN 202110394477 A CN202110394477 A CN 202110394477A CN 113313344 A CN113313344 A CN 113313344A
Authority
CN
China
Prior art keywords
label
mode
model
task
calculation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110394477.2A
Other languages
Chinese (zh)
Other versions
CN113313344B (en
Inventor
李巍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Fiberhome Digtal Technology Co Ltd
Original Assignee
Wuhan Fiberhome Digtal Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Fiberhome Digtal Technology Co Ltd filed Critical Wuhan Fiberhome Digtal Technology Co Ltd
Priority to CN202110394477.2A priority Critical patent/CN113313344B/en
Publication of CN113313344A publication Critical patent/CN113313344A/en
Application granted granted Critical
Publication of CN113313344B publication Critical patent/CN113313344B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06311Scheduling, planning or task assignment for a person or group
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Economics (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Strategic Management (AREA)
  • Evolutionary Computation (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Educational Administration (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A label system construction method fusing multiple modes comprises the steps of firstly determining a target label based on business requirements, analyzing and judging a calculation mode and a label directory hierarchy of the target label, and then selecting a label calculation mode from an SQL mode, a model mode and a user-defined mode. And in the SQL mode, data resources are stored in a Hive big data resource pool according to data resources required by reverse analysis of a target label, a label result is stored in an ElasticSearch, and an incremental marking task workflow is established by writing SQL logic codes to realize label calculation. The model mode realizes marking based on a label probability model, a label integral model or a label combination model. Marking is realized through uploading the only main key of the object in a self-defining mode. And finally, activating a tag computation task. The invention takes the business requirement as the guide, sets three label calculation modes of an SQL mode, a model mode and a user-defined mode, and meets various label scenes; model training is performed based on the LightGBM algorithm, the training speed is high, the memory overhead is low, and the model precision and the generalization capability are strong.

Description

Label system construction method and system fusing multiple modes
Technical Field
The invention relates to the field of big data analysis, in particular to a label system construction method and system fusing multiple modes.
Background
With the rapid development of big data technology, the data accumulated in each industry is more and more, the data structure is more and more complex, and the problem of low data value density is more and more prominent. Related industries accumulate and gather data of internal, internet and government affair networks of various industries such as population, civil aviation, railway, lodging, social security and the like. The data volume is huge, and the simple application of the simple listing record cannot well achieve the purpose of data management and integration. The label is used for describing the data of the business entity characteristics, descriptive label attributes aiming at the business object are established on a plurality of dimensions by establishing a related industry data label system, the characteristics of the business object are sketched and described, and the portrait of the object is constructed to better serve business application. However, a set of method system and tool for rapidly constructing tags, which can adapt to multiple scenes, is lacking, and therefore, there is a need to design a method and system for constructing a tag system that integrates multiple modes.
Disclosure of Invention
In view of the above, the present invention has been developed to provide a method and system for building a label architecture that merges multiple modalities that overcomes or at least partially solves the above-mentioned problems.
In order to solve the technical problem, the embodiment of the application discloses the following technical scheme:
a label system construction method fusing multiple modes comprises the following steps:
s100, determining a target label based on the service requirement, and analyzing and determining a calculation mode and a label directory hierarchy of the target label;
s200, compiling a calculation task logic according to the determined different target label calculation modes;
s300, activating a tag calculation task according to the determined different target tag calculation modes.
Further, in S100, the calculation mode of the target tag includes three modes, i.e., an SQL mode, a model mode, and a custom mode.
Furthermore, the label directory hierarchy adopts a four-layer architecture, namely label objects, primary classification, secondary classification and labels.
Further, when the calculation mode of the target tag is the SQL mode, the specific method is as follows: and (3) according to data resources required by reverse analysis of the target label, storing the data resources in a Hive big data resource pool, storing a label result in an ElasticSearch, and establishing an increment marking task workflow by writing SQL logic codes to realize label calculation.
Further, in S200, when the calculation method of the target label is a model method, the model method is divided into a label probability model, a label integral model and a label combination model, and the model generates a new label based on an existing label.
Further, when the model mode is a label probability model, the specific method is as follows: selecting a positive model sample, and randomly extracting negative samples according to a certain proportion to jointly form a modeling sample; dividing the modeling sample into a training set and a testing set, taking the label characteristic value as model input, then performing model training by using a LightGBM algorithm, respectively calculating the accuracy, the precision and the recall rate of the model on the training set and the testing set, and performing model evaluation; after the model is trained, the judging probability can be obtained according to the label characteristic value of the data object, finally, the probability is divided into threshold values, and the object exceeding the threshold values is labeled, wherein the label is a numerical label.
Further, when the model mode is a label integral model, the specific method is as follows: firstly, configuring a label integral rule, then carrying out integral weighted summation calculation based on the ElasticSearch, and finally, dividing the integral into thresholds, and marking the label on the object exceeding the thresholds, wherein the label is a numerical label.
Further, when the model mode is a label combination model, the specific method is as follows: firstly, configuring label intersection, union and complement calculation logic, screening out object groups, and then marking new labels, wherein the labels are classified labels.
Further, when the calculation mode of the target tag is a custom mode, the specific method is as follows: and inputting a unique main key of the object, and marking a self-defined label on the unique main key, wherein the type of label is a type label.
The invention also discloses a label system construction system fusing the multimode, which comprises the following steps: the system comprises a tag directory module, a tag newly-built module, a task configuration module, a task scheduling module and a task monitoring module; wherein:
a tag directory module: the system is used for configuring a label hierarchy directory tree in a user-defined mode, wherein the directory tree is divided into a first-level classification, a second-level classification and a third-level classification;
a new label building module: the method is used for editing and storing label metadata, filling basic label information comprising label names, label levels, label descriptions and label synonyms, and can add a plurality of groups of labels at one time;
a task configuration module: for configuring tag computation task logic. For the SQL mode, task description, feature categories, feature names, mapping rules, update modes, update cycles, and incremental SQL need to be configured. For the model approach: task description, model type, selection model, feature type, feature name and update period are required to be filled; the label probability model also needs to set a probability threshold range, and the integral model needs to set an integral threshold range. For the custom mode, task description, feature categories, and feature names need to be filled in.
A task scheduling module: the system is used for activating the tag calculation task and realizing timing scheduling; and for the label calculation tasks in the SQL mode and the model mode, setting the validity period of the label tasks, and scheduling the tasks based on the timer after activation. And activating a user-defined mode to directly upload the object file, and completing marking at one time.
A task monitoring module: the method is used for monitoring the execution state of the label scheduling task and only monitoring the tasks in the SQL mode and the model mode. The abnormal tasks are divided into batch running abnormity and increment abnormity; the batch running exception is to detect the execution state of the task batch, if the task execution fails, an early warning is given, and the reason of the failure is returned; the increment exception is to detect the data volume change before and after the label batch, and if the label increment of the current batch is 0, early warning is carried out.
The technical scheme provided by the embodiment of the invention has the beneficial effects that at least:
the invention provides a multi-mode-fused tag system construction method which comprises the steps of firstly determining a target tag based on business requirements, analyzing and judging a calculation mode and a tag directory hierarchy of the target tag, and then selecting a tag calculation mode from an SQL mode, a model mode and a user-defined mode. And in the SQL mode, data resources are stored in a Hive big data resource pool according to data resources required by reverse analysis of a target label, a label result is stored in an ElasticSearch, and an incremental marking task workflow is established by writing SQL logic codes to realize label calculation. The model mode realizes marking based on a label probability model, a label integral model or a label combination model. Marking is realized through uploading the only main key of the object in a self-defining mode. And finally, activating the tag calculation task, setting the validity period of the tag calculation task for the tag calculation tasks in the SQL mode and the model mode, and scheduling the tasks based on the timer after activation. And activating a user-defined mode to directly upload the object file, and completing marking at one time. The effect is as follows: firstly, three tag calculation modes, namely an SQL mode, a model mode and a user-defined mode, are set by taking service requirements as guidance, so that various tag scenes are met; secondly, model training is performed based on the LightGBM algorithm, and the method is high in training speed, small in memory overhead, and high in model precision and generalization capability.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
fig. 1 is a flowchart of a method for constructing a label system that incorporates multiple modes in embodiment 1 of the present invention;
fig. 2 is a block diagram of a tag architecture construction system incorporating multiple modes in embodiment 1 of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
In order to solve the problem that a set of method system and tool capable of adapting to multiple scenes for quickly constructing tags are lacked in the prior art, the embodiment of the invention provides a method and a system for constructing a tag system integrating multiple modes.
Example 1
The embodiment discloses a label system construction method fusing multiple modes, as shown in fig. 1, including:
s100, determining a target label based on the service requirement, and analyzing and determining a calculation mode and a label directory hierarchy of the target label.
Specifically, the tag calculation mode is divided into an SQL mode, a model mode and a user-defined mode. The label directory hierarchy generally adopts a four-layer architecture, namely label objects, primary classification, secondary classification and labels. The label comprises a label value and a characteristic value, the label value indicates whether the object has the label and is marked as True and False, and the characteristic value is a quantitative characteristic representation of the label. According to the characteristic category, the labels can be divided into a category type and a numerical type, the characteristic value of the category type label is a discrete variable, and the characteristic value of the numerical type label is a continuous variable.
S200, writing a calculation task logic according to the determined different target label calculation modes.
Specifically, when the calculation mode of the target tag is the SQL mode, the specific method is as follows: and (3) according to data resources required by reverse analysis of the target label, storing the data resources in a Hive big data resource pool, storing a label result in an ElasticSearch, and establishing an increment marking task workflow by writing SQL logic codes to realize label calculation. The SQL mode working module specifically comprises: the device comprises a table building module, an increment marking module, a data synchronization module and a label counting module. The table building module is used for creating a middle table of label calculation, including a new label table, an old label table, a statistical label table and a data synchronization channel table. The increment marking module is used for calculating a difference set of the new label table and the old label table and taking out the increment labels. The data synchronization module is used for bidirectional synchronization of the tag result data between Hive and elastic search. And the label counting module is used for counting the label increment. The type and numerical type labels can be generated in this way.
When the calculation mode of the target label is a model mode, the model mode is divided into a label probability model, a label integral model and a label combination model, and the model generates a new label based on the existing label.
Specifically, when the model mode is a label probability model, the specific method is as follows: selecting a positive model sample, and randomly extracting negative samples according to a certain proportion to jointly form a modeling sample; dividing the modeling sample into a training set and a testing set, taking the label characteristic value as model input, then performing model training by using a LightGBM algorithm, respectively calculating the accuracy, the precision and the recall rate of the model on the training set and the testing set, and performing model evaluation; after the model is trained, the judging probability can be obtained according to the label characteristic value of the data object, finally, the probability is divided into threshold values, and the object exceeding the threshold values is labeled, wherein the label is a numerical label.
When the model mode is a label integral model, the specific method comprises the following steps: firstly, configuring a label integral rule, then carrying out integral weighted summation calculation based on the ElasticSearch, and finally, dividing the integral into thresholds, and marking the label on the object exceeding the thresholds, wherein the label is a numerical label.
When the model mode is a label combination model, the specific method comprises the following steps: firstly, configuring label intersection, union and complement calculation logic, screening out object groups, and then marking new labels, wherein the labels are classified labels.
In this embodiment, when the calculation mode of the target tag is a custom mode, the specific method is as follows: and inputting a unique main key of the object, and marking a self-defined label on the unique main key, wherein the type of label is a type label.
S300, activating a tag calculation task according to the determined different target tag calculation modes. Specifically, the step S300 specifically includes: and for the label calculation tasks in the SQL mode and the model mode, setting the validity period of the label tasks, and scheduling the tasks based on the timer after activation. And activating a user-defined mode to directly upload the object file, and completing marking at one time.
In order to better understand the embodiment, a frequent hotel stay tag and a hidden virus-involved person tag are taken as examples, when the target tag is a frequent hotel stay tag in the last year and half night, the tag is a numerical tag, and the tag calculation is performed in an SQL mode. The label belongs to a label under a person-behavior attribute-hotel accommodation level, the required data resource is hotel accommodation record data, and the label calculation logic is as follows: the check-in time is less than 365 days, at 0 to 6 points, the statistics are carried out according to the ID card aggregation groups, and if the check-in times are more than 10, the hotel frequently checks in the label in the last year and half night. When the target label is a recessive virus-related personnel label, the label is a numerical label, a model mode is adopted for label calculation, specifically, a label probability model belongs to a label under the human-high-risk attribute-concerned personnel level, and the required model is a virus-related personnel probability study and judgment model. Firstly, selecting known virus-related personnel, randomly extracting non-virus-related personnel according to equal proportion to form a model sample set, then dividing a training set and a testing set, carrying out classification model training by using LightGBM, and obtaining a model to distinguish the virus-related personnel from the non-virus-related personnel as much as possible so as to evaluate the effect of the model by accuracy, precision and recall rate. When the accuracy rate and the precision rate are more than 90%, the recall rate is more than 75%, and the indexes of the training set and the test set are small in difference, the model has applicability. After the model is selected, the configuration judging probability threshold value is 0.9, and then the output result of the model is more than 0.9, and people can be marked with hidden toxic-related personnel labels.
The method for constructing the label system fusing the multiple modes provided by the embodiment includes the steps of firstly determining a target label based on business requirements, analyzing and judging a calculation mode and a label directory hierarchy of the target label, and then selecting a label calculation mode from an SQL mode, a model mode and a user-defined mode. And in the SQL mode, data resources are stored in a Hive big data resource pool according to data resources required by reverse analysis of a target label, a label result is stored in an ElasticSearch, and an incremental marking task workflow is established by writing SQL logic codes to realize label calculation. The model mode realizes marking based on a label probability model, a label integral model or a label combination model. Marking is realized through uploading the only main key of the object in a self-defining mode. And finally, activating the tag calculation task, setting the validity period of the tag calculation task for the tag calculation tasks in the SQL mode and the model mode, and scheduling the tasks based on the timer after activation. And activating a user-defined mode to directly upload the object file, and completing marking at one time. The effect is as follows: firstly, three tag calculation modes, namely an SQL mode, a model mode and a user-defined mode, are set by taking service requirements as guidance, so that various tag scenes are met; secondly, model training is performed based on the LightGBM algorithm, and the method is high in training speed, small in memory overhead, and high in model precision and generalization capability.
Example 2
The embodiment discloses a label system construction system fusing multiple modes, as shown in fig. 2, including: the system comprises a tag directory module, a tag newly-built module, a task configuration module, a task scheduling module and a task monitoring module; wherein:
a tag directory module: the method is used for configuring the label hierarchy directory tree in a user-defined mode, wherein the directory tree is divided into a first-level classification, a second-level classification and a third-level classification, and in the embodiment, the first-level classification is a label object.
A new label building module: the method is used for editing and storing label metadata, filling basic label information comprising label names, label levels, label descriptions and label synonyms, and can add a plurality of groups of labels at one time; in this embodiment, the tag includes a tag value and a feature value, the tag value indicates whether the object has the tag, which is denoted as True and False, and the feature value is a quantization feature representation of the tag. According to the characteristic category, the labels can be divided into a category type and a numerical type, the characteristic value of the category type label is a discrete variable, and the characteristic value of the numerical type label is a continuous variable.
A task configuration module: for configuring tag computation task logic. For the SQL mode, task description, feature categories, feature names, mapping rules, update modes, update cycles, and incremental SQL need to be configured. For the model approach: task description, model type, selection model, feature type, feature name and update period are required to be filled; the label probability model also needs to set a probability threshold range, and the integral model needs to set an integral threshold range. For the custom mode, task description, feature categories, and feature names need to be filled in.
Specifically, the specific working methods of the SQL mode, the model mode, and the user-defined mode in this embodiment have been described in detail in embodiment 1, and this embodiment will not be described again.
A task scheduling module: the system is used for activating the tag calculation task and realizing timing scheduling; and for the label calculation tasks in the SQL mode and the model mode, setting the validity period of the label tasks, and scheduling the tasks based on the timer after activation. And activating a user-defined mode to directly upload the object file, and completing marking at one time.
A task monitoring module: the method is used for monitoring the execution state of the label scheduling task and only monitoring the tasks in the SQL mode and the model mode. The abnormal tasks are divided into batch running abnormity and increment abnormity; the batch running exception is to detect the execution state of the task batch, if the task execution fails, an early warning is given, and the reason of the failure is returned; the increment exception is to detect the data volume change before and after the label batch, and if the label increment of the current batch is 0, early warning is carried out.
The system for constructing the label system fusing the multiple modes, provided by the embodiment, is characterized in that a target label is determined based on business requirements, a calculation mode and a label directory hierarchy of the target label are analyzed and judged, and then a label calculation mode is selected from an SQL mode, a model mode and a user-defined mode. And in the SQL mode, data resources are stored in a Hive big data resource pool according to data resources required by reverse analysis of a target label, a label result is stored in an ElasticSearch, and an incremental marking task workflow is established by writing SQL logic codes to realize label calculation. The model mode realizes marking based on a label probability model, a label integral model or a label combination model. Marking is realized through uploading the only main key of the object in a self-defining mode. And finally, activating the tag calculation task, setting the validity period of the tag calculation task for the tag calculation tasks in the SQL mode and the model mode, and scheduling the tasks based on the timer after activation. And activating a user-defined mode to directly upload the object file, and completing marking at one time. The effect is as follows: firstly, three tag calculation modes, namely an SQL mode, a model mode and a user-defined mode, are set by taking service requirements as guidance, so that various tag scenes are met; secondly, model training is performed based on the LightGBM algorithm, and the method is high in training speed, small in memory overhead, and high in model precision and generalization capability.
It should be understood that the specific order or hierarchy of steps in the processes disclosed is an example of exemplary approaches. Based upon design preferences, it is understood that the specific order or hierarchy of steps in the processes may be rearranged without departing from the scope of the present disclosure. The accompanying method claims present elements of the various steps in a sample order, and are not intended to be limited to the specific order or hierarchy presented.
In the foregoing detailed description, various features are grouped together in a single embodiment for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed embodiments of the subject matter require more features than are expressly recited in each claim. Rather, as the following claims reflect, invention lies in less than all features of a single disclosed embodiment. Thus, the following claims are hereby expressly incorporated into the detailed description, with each claim standing on its own as a separate preferred embodiment of the invention.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. Of course, the processor and the storage medium may reside as discrete components in a user terminal.
For a software implementation, the techniques described herein may be implemented with modules (e.g., procedures, functions, and so on) that perform the functions described herein. The software codes may be stored in memory units and executed by processors. The memory unit may be implemented within the processor or external to the processor, in which case it can be communicatively coupled to the processor via various means as is known in the art.
What has been described above includes examples of one or more embodiments. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the aforementioned embodiments, but one of ordinary skill in the art may recognize that many further combinations and permutations of various embodiments are possible. Accordingly, the embodiments described herein are intended to embrace all such alterations, modifications and variations that fall within the scope of the appended claims. Furthermore, to the extent that the term "includes" is used in either the detailed description or the claims, such term is intended to be inclusive in a manner similar to the term "comprising" as "comprising" is interpreted when employed as a transitional word in a claim. Furthermore, any use of the term "or" in the specification of the claims is intended to mean a "non-exclusive or".

Claims (10)

1. A label system construction method fusing multiple modes is characterized by comprising the following steps:
s100, determining a target label based on the service requirement, and analyzing and determining a calculation mode and a label directory hierarchy of the target label;
s200, compiling a calculation task logic according to the determined different target label calculation modes;
s300, activating a tag calculation task according to the determined different target tag calculation modes.
2. The method for constructing the label system integrating the multiple modes according to claim 1, wherein in S100, the calculation mode of the target label includes three modes, i.e., an SQL mode, a model mode and a custom mode.
3. The method for constructing a label architecture fusing multiple modes according to claim 1, wherein in S100, the label directory hierarchy adopts a four-layer architecture, namely label object, primary classification, secondary classification and label.
4. The method for constructing a label system integrating multiple modes according to claim 2, wherein in S200, when the calculation mode of the target label is SQL, the specific method is as follows: and (3) according to data resources required by reverse analysis of the target label, storing the data resources in a Hive big data resource pool, storing a label result in an ElasticSearch, and establishing an increment marking task workflow by writing SQL logic codes to realize label calculation.
5. The method for constructing a label system integrating multiple modes according to claim 2, wherein in S200, when the calculation mode of the target label is a model mode, the model mode is divided into a label probability model, a label integral model and a label combination model, and the model generates a new label based on an existing label.
6. The method for constructing the label system fusing the multiple modes according to claim 5, wherein when the model mode is the label probability model, the specific method is as follows: selecting a positive model sample, and randomly extracting negative samples according to a certain proportion to jointly form a modeling sample; dividing the modeling sample into a training set and a testing set, taking the label characteristic value as model input, then performing model training by using a LightGBM algorithm, respectively calculating the accuracy, the precision and the recall rate of the model on the training set and the testing set, and performing model evaluation; after the model is trained, the judging probability can be obtained according to the label characteristic value of the data object, finally, the probability is divided into threshold values, and the object exceeding the threshold values is labeled, wherein the label is a numerical label.
7. The label system construction method fusing the multimode as claimed in claim 5, wherein when the model mode is a label integral model, the specific method is as follows: firstly, configuring a label integral rule, then carrying out integral weighted summation calculation based on the ElasticSearch, and finally, dividing the integral into thresholds, and marking the label on the object exceeding the thresholds, wherein the label is a numerical label.
8. The method for constructing the label system fusing the multiple modes according to claim 5, wherein when the model mode is a label combination model, the specific method is as follows: firstly, configuring label intersection, union and complement calculation logic, screening out object groups, and then marking new labels, wherein the labels are classified labels.
9. The multi-mode-fused tag system construction method according to claim 2, wherein when the calculation mode of the target tag is an S-custom mode, the specific method is as follows: and inputting a unique main key of the object, and marking a self-defined label on the unique main key, wherein the type of label is a type label.
10. A label system construction system fusing multiple modes is characterized by comprising: the system comprises a tag directory module, a tag newly-built module, a task configuration module, a task scheduling module and a task monitoring module; wherein:
a tag directory module: the system is used for configuring a label hierarchy directory tree in a user-defined mode, wherein the directory tree is divided into a first-level classification, a second-level classification and a third-level classification;
a new label building module: the method is used for editing and storing label metadata, filling basic label information comprising label names, label levels, label descriptions and label synonyms, and can add a plurality of groups of labels at one time;
a task configuration module: for configuring tag computation task logic; for the SQL mode, task description, feature category, feature name, mapping rule, updating mode, updating period and incremental SQL need to be configured; for the model approach: task description, model type, selection model, feature type, feature name and update period are required to be filled; wherein, the label probability model also needs to set a probability threshold range, and the integral model needs to set an integral threshold range; for a user-defined mode, task description, feature categories and feature names need to be filled;
a task scheduling module: the system is used for activating the tag calculation task and realizing timing scheduling; setting the validity period of the tag task for the tag calculation tasks in the SQL mode and the model mode, and scheduling the tasks based on a timer after activation; for a self-defining mode, directly uploading an object file is activated, and marking is finished at one time;
a task monitoring module: the system is used for monitoring the execution state of the label scheduling task and only monitoring the tasks in an SQL mode and a model mode; the abnormal tasks are divided into batch running abnormity and increment abnormity; the batch running exception is to detect the execution state of the task batch, if the task execution fails, an early warning is given, and the reason of the failure is returned; the increment exception is to detect the data volume change before and after the label batch, and if the label increment of the current batch is 0, early warning is carried out.
CN202110394477.2A 2021-04-13 2021-04-13 Label system construction method and system fusing multiple modes Active CN113313344B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110394477.2A CN113313344B (en) 2021-04-13 2021-04-13 Label system construction method and system fusing multiple modes

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110394477.2A CN113313344B (en) 2021-04-13 2021-04-13 Label system construction method and system fusing multiple modes

Publications (2)

Publication Number Publication Date
CN113313344A true CN113313344A (en) 2021-08-27
CN113313344B CN113313344B (en) 2023-03-31

Family

ID=77372342

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110394477.2A Active CN113313344B (en) 2021-04-13 2021-04-13 Label system construction method and system fusing multiple modes

Country Status (1)

Country Link
CN (1) CN113313344B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115510324A (en) * 2022-09-29 2022-12-23 中电金信软件有限公司 Method and device for determining label system, electronic equipment and storage medium

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100256969A1 (en) * 2009-04-07 2010-10-07 Microsoft Corporation Generating implicit labels and training a tagging model using such labels
US20120005190A1 (en) * 2010-05-14 2012-01-05 Sap Ag Performing complex operations in a database using a semantic layer
CN106897402A (en) * 2017-02-13 2017-06-27 山大地纬软件股份有限公司 The method and user's portrait maker of user's portrait are built based on social security data
CN108596679A (en) * 2018-04-27 2018-09-28 中国联合网络通信集团有限公司 Construction method, device, terminal and the computer readable storage medium of user's portrait
CN109101652A (en) * 2018-08-27 2018-12-28 宜人恒业科技发展(北京)有限公司 A kind of creation of label and management system
CN109739909A (en) * 2019-01-07 2019-05-10 山东浪潮通软信息科技有限公司 A kind of methods of exhibiting and system fast implementing data visualization chart based on label
CN109872173A (en) * 2017-12-04 2019-06-11 北京京东尚科信息技术有限公司 Construct method, system and the terminal device of user's portrait label
CN109903097A (en) * 2019-03-05 2019-06-18 云南电网有限责任公司信息中心 A kind of user draws a portrait construction method and user draws a portrait construction device
CN110147499A (en) * 2019-05-21 2019-08-20 智者四海(北京)技术有限公司 Label method, recommended method and recording medium
CN110209709A (en) * 2019-06-06 2019-09-06 四川九洲电器集团有限责任公司 A method of concern human behavior analysis
CN110765101A (en) * 2019-09-09 2020-02-07 湖南天云软件技术有限公司 Label generation method and device, computer readable storage medium and server
CN111062750A (en) * 2019-12-13 2020-04-24 中国平安财产保险股份有限公司 User portrait label modeling and analyzing method, device, equipment and storage medium
CN111177129A (en) * 2019-12-16 2020-05-19 中国平安财产保险股份有限公司 Label system construction method, device, equipment and storage medium
CN112148810A (en) * 2020-11-10 2020-12-29 南京智数云信息科技有限公司 User portrait analysis system supporting custom label
CN112182391A (en) * 2020-09-30 2021-01-05 北京神州泰岳智能数据技术有限公司 User portrait drawing method and device
CN112559740A (en) * 2020-12-03 2021-03-26 星宏传媒有限公司 Advertisement label classification method, system and equipment based on multi-model fusion

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100256969A1 (en) * 2009-04-07 2010-10-07 Microsoft Corporation Generating implicit labels and training a tagging model using such labels
US20120005190A1 (en) * 2010-05-14 2012-01-05 Sap Ag Performing complex operations in a database using a semantic layer
CN106897402A (en) * 2017-02-13 2017-06-27 山大地纬软件股份有限公司 The method and user's portrait maker of user's portrait are built based on social security data
CN109872173A (en) * 2017-12-04 2019-06-11 北京京东尚科信息技术有限公司 Construct method, system and the terminal device of user's portrait label
CN108596679A (en) * 2018-04-27 2018-09-28 中国联合网络通信集团有限公司 Construction method, device, terminal and the computer readable storage medium of user's portrait
CN109101652A (en) * 2018-08-27 2018-12-28 宜人恒业科技发展(北京)有限公司 A kind of creation of label and management system
CN109739909A (en) * 2019-01-07 2019-05-10 山东浪潮通软信息科技有限公司 A kind of methods of exhibiting and system fast implementing data visualization chart based on label
CN109903097A (en) * 2019-03-05 2019-06-18 云南电网有限责任公司信息中心 A kind of user draws a portrait construction method and user draws a portrait construction device
CN110147499A (en) * 2019-05-21 2019-08-20 智者四海(北京)技术有限公司 Label method, recommended method and recording medium
CN110209709A (en) * 2019-06-06 2019-09-06 四川九洲电器集团有限责任公司 A method of concern human behavior analysis
CN110765101A (en) * 2019-09-09 2020-02-07 湖南天云软件技术有限公司 Label generation method and device, computer readable storage medium and server
CN111062750A (en) * 2019-12-13 2020-04-24 中国平安财产保险股份有限公司 User portrait label modeling and analyzing method, device, equipment and storage medium
CN111177129A (en) * 2019-12-16 2020-05-19 中国平安财产保险股份有限公司 Label system construction method, device, equipment and storage medium
CN112182391A (en) * 2020-09-30 2021-01-05 北京神州泰岳智能数据技术有限公司 User portrait drawing method and device
CN112148810A (en) * 2020-11-10 2020-12-29 南京智数云信息科技有限公司 User portrait analysis system supporting custom label
CN112559740A (en) * 2020-12-03 2021-03-26 星宏传媒有限公司 Advertisement label classification method, system and equipment based on multi-model fusion

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115510324A (en) * 2022-09-29 2022-12-23 中电金信软件有限公司 Method and device for determining label system, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN113313344B (en) 2023-03-31

Similar Documents

Publication Publication Date Title
CN111950738A (en) Machine learning model optimization effect evaluation method and device, terminal and storage medium
CN112257777A (en) Off-job prediction method based on hidden Markov model and related device
CN110688536A (en) Label prediction method, device, equipment and storage medium
CN113313344B (en) Label system construction method and system fusing multiple modes
CN114037219A (en) Data evaluation method and device and electronic equipment
CN111160959A (en) User click conversion estimation method and device
CN115203167A (en) Data detection method and device, computer equipment and storage medium
CN113177644A (en) Automatic modeling system based on word embedding and depth time sequence model
CN112394973B (en) Multi-language code plagiarism detection method based on pseudo-twin network
CN111581185B (en) Rule-based data relative aging repair and anomaly detection method
CN111651271B (en) Multi-task learning semantic annotation method and device based on legal data
CN111325255B (en) Specific crowd delineating method and device, electronic equipment and storage medium
CN116932523A (en) Platform for integrating and supervising third party environment detection mechanism
CN116910526A (en) Model training method, device, communication equipment and readable storage medium
CN109710574B (en) Method and device for extracting key information from literature
WO2020091619A1 (en) Automated assessment of the quality of a dialogue system in real time
CN112801305B (en) Coping strategy prediction processing method, coping strategy prediction processing device, computer equipment and storage medium
CN113610499B (en) Method and system for job hunting function team occupational credit archives based on blockchain
CN105824871B (en) A kind of picture detection method and equipment
CN112433952B (en) Method, system, device and medium for testing fairness of deep neural network model
CN114611841A (en) Scenic spot tourist flow prediction method and device
CN113935788A (en) Model evaluation method, device, equipment and computer readable storage medium
CN115330103A (en) Intelligent analysis method and device for urban operation state, computer equipment and storage medium
CN113408263A (en) Criminal period prediction method and device, storage medium and electronic device
Shukla et al. Sentiment analysis of international relations with artificial intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant