CN113961725A - Automatic label labeling method, system, equipment and storage medium - Google Patents

Automatic label labeling method, system, equipment and storage medium Download PDF

Info

Publication number
CN113961725A
CN113961725A CN202111240212.3A CN202111240212A CN113961725A CN 113961725 A CN113961725 A CN 113961725A CN 202111240212 A CN202111240212 A CN 202111240212A CN 113961725 A CN113961725 A CN 113961725A
Authority
CN
China
Prior art keywords
label
labeling
automatic
word
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111240212.3A
Other languages
Chinese (zh)
Inventor
刘畅奕航
徐世超
梁志婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Mininglamp Software System Co ltd
Original Assignee
Beijing Mininglamp Software System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Mininglamp Software System Co ltd filed Critical Beijing Mininglamp Software System Co ltd
Priority to CN202111240212.3A priority Critical patent/CN113961725A/en
Publication of CN113961725A publication Critical patent/CN113961725A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/381Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using identifiers, e.g. barcodes, RFIDs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Library & Information Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a label automatic labeling method and a system thereof, wherein the method comprises the following steps: a word grouping step: establishing a plurality of independent word banks based on the business corpus, and performing word grouping aiming at the words of each word bank; label and label rule defining step: based on each word stock, after defining tag grouping according to the service elements, selecting or self-defining words from the word stock as tags, and defining tag rules based on an automatic tag labeling matching mode; labeling the label: and matching the words in the text to be labeled with the words in the word groups based on the label rule, labeling the words in the text to be labeled which meet the matching condition, and storing the label automatic labeling result into a label result table to finish the automatic labeling of the label. The invention provides a set of label automatic labeling method, which maintains a corresponding word bank, label and rule system according to the industry, automatically marks through an algorithm model, and provides evaluation gradual training to improve marking accuracy.

Description

Automatic label labeling method, system, equipment and storage medium
Technical Field
The present application relates to the field of data analysis, and in particular, to a method, a system, a computer device, and a computer-readable storage medium for automatically labeling a tag.
Background
Currently, in an online service scenario, the service quality is difficult to monitor and measure due to the uncertainty of the scenario. The dialogue tool in the service process can be imaged into a text, the service quality tool is imaged into a corresponding label, the label rules of the customer under different scenes and different dialogue topics are effectively extracted through automatic and accurate marking of the text, data support is provided for a sales business analysis model, and a digital and visual basis is provided for customer supervision.
And evaluating the text label from the table under a manual line, and automatically calculating an evaluation result, the coverage rate of the label and the accuracy rate of the label.
At present, aiming at the following bottlenecks in the related art, no effective solution is proposed:
(1) the labor cost is high, and division and cooperation cannot be effectively carried out.
(2) The evaluation accuracy rate cannot be guaranteed, and the diversity of the labels is difficult to be compared accurately by manpower to obtain errors.
(3) The calculation mode of the coverage rate and the accuracy rate is difficult to calculate manually.
(4) The "cold start" phase of insufficient cognition in a new service scenario cannot be solved.
In order to solve the problems and bottlenecks existing in the prior art, the invention maintains a corresponding word stock, label and rule system according to the industry; automatic marking is carried out through an algorithm model, multiple persons are supported for evaluating, cooperating and dividing labor by taking a task as a unit, and the efficiency is improved; providing, calculating and recalculating coverage rate and accuracy rate indexes, and gradually training an optimized label; and for a brand-new service scene cold start stage, a word bank and a label rule which are based on discovery function supplement are provided.
Disclosure of Invention
The embodiment of the application provides a method for supporting multi-person evaluation cooperation division work by taking a task as a unit based on a word bank, a label and a rule system and carrying out automatic marking through an algorithm model.
In a first aspect, an embodiment of the present application provides an automatic label labeling method, including:
a word grouping step: establishing a plurality of independent word banks based on the business corpus, and performing word grouping aiming at the words of each word bank;
label and label rule defining step: based on each word stock, after defining tag grouping according to the service elements, selecting or self-defining words from the word stock as tags, and defining tag rules based on an automatic tag labeling matching mode;
labeling the label: and matching the words in the text to be labeled with the words in the word groups based on the label rule, labeling the words in the text to be labeled which meet the matching condition, and storing the label automatic labeling result into a label result table to finish the automatic labeling of the label.
In some embodiments of the present invention, the automatic label labeling method further includes:
and a label labeling result evaluating step: after the automatic label marking result is evaluated, obtaining a label evaluation result, and evaluating and calculating the result of the evaluation task based on the label evaluation result;
and (3) optimizing the label rule: optimizing the label rule based on the output of the result evaluation calculation;
recalculating: and based on the optimized label rule, recalculation is initiated, and automatic labeling of the system label is carried out again.
In some embodiments of the present invention, the automatic label labeling method further includes:
and (3) cold starting of a label system: calculating word frequency and importance degree by word segmentation of the new service corpus, extracting keywords, and supplementing the keywords into a word bank and a label; and clustering the new service corpora, extracting a new label rule based on a clustering result, and supplementing the new label rule.
In some embodiments of the present invention, the step of evaluating the label labeling result includes:
and an evaluation result calculation step: and evaluating and calculating the evaluation task based on the evaluated label marking result to obtain the coverage rate and the accuracy rate of the automatic marking of the label.
In a second aspect, an embodiment of the present application provides an automatic label labeling system, which employs the above automatic label labeling method, and includes:
a word grouping module: establishing a plurality of independent word banks based on the business corpus, and performing word grouping aiming at the words of each word bank;
the label and label rule definition module: based on each word stock, after defining tag grouping according to the service elements, selecting or self-defining words from the word stock as tags, and defining tag rules based on an automatic tag labeling matching mode;
a label labeling module: and matching the words in the text to be labeled with the words in the word groups based on the label rule, labeling the words in the text to be labeled which meet the matching condition, and storing the label automatic labeling result into a label result table to finish the automatic labeling of the label.
In some embodiments of the present invention, the automatic label labeling system further includes:
a label labeling result evaluating module: after the automatic label marking result is evaluated, obtaining a label evaluation result, and evaluating and calculating the result of the evaluation task based on the label evaluation result;
a label rule optimization module: optimizing the label rule based on the output of the result evaluation calculation;
a recalculation module: and based on the optimized label rule, recalculation is initiated, and automatic labeling of the system label is carried out again.
In some embodiments of the present invention, the automatic label labeling system further includes:
the label system cold start module: calculating word frequency and importance degree by word segmentation of the new service corpus, extracting keywords, and supplementing the keywords into a word bank and a label; and clustering the new service corpora, extracting a new label rule based on a clustering result, and supplementing the new label rule.
In some embodiments of the present invention, in the automatic label labeling system, the label labeling result evaluating module includes:
an evaluation result calculation module: and evaluating and calculating the evaluation task based on the evaluated label marking result to obtain the coverage rate and the accuracy rate of the automatic marking of the label.
In a third aspect, an embodiment of the present application provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the computer program, the automatic label labeling method according to the first aspect is implemented by the processor.
In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the automatic label labeling method according to the first aspect.
Compared with the prior art, the invention provides a set of label rule training optimization methodology: corresponding word banks, labels and rule systems are maintained according to the industry, automatic marking is carried out through an algorithm model, manual evaluation is provided, and marking accuracy is improved.
And for a brand-new service scene cold start stage, a word bank and a label rule which are based on discovery function supplement are provided.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the application.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a flow chart of an automatic labeling method of labels according to the present invention;
FIG. 2 is a schematic diagram of an automatic labeling system for labels according to the present invention;
FIG. 3 is a diagram illustrating a word library structure according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a tag structure according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a general architecture of an embodiment of the present invention;
FIG. 6 is a schematic diagram illustrating a tag evaluation flow according to an embodiment of the present invention;
fig. 7 is a hardware structure diagram of a computer device according to an embodiment of the present application.
In the above figures:
100 automatic label marking system
10-word grouping module, 20-label and label rule defining module
30 label marking module
81. A processor; 82. a memory; 83. a communication interface; 80. a bus.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application.
It is obvious that the drawings in the following description are only examples or embodiments of the present application, and that it is also possible for a person skilled in the art to apply the present application to other similar contexts on the basis of these drawings without inventive effort. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as referred to herein means two or more. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.
The application relates to and provides a method for training and optimizing a set of label rules, which comprises the following steps: according to an algorithm model, automatic marking is carried out according to a word bank, a label and a rule system corresponding to industry maintenance, manual evaluation is provided, and marking accuracy is improved; and for a brand-new service scene cold start stage, a word bank and a label rule which are based on discovery function supplement are provided.
In order to solve the problems and bottlenecks existing in the prior art, the invention maintains a corresponding word stock, label and rule system according to the industry; automatic marking is carried out through an algorithm model, multiple persons are supported for evaluating, cooperating and dividing labor by taking a task as a unit, and the efficiency is improved; providing, calculating and recalculating coverage rate and accuracy rate indexes, and gradually training an optimized label; and for a brand-new service scene cold start stage, a word bank and a label rule which are based on discovery function supplement are provided.
Fig. 1 is a schematic flow chart of an automatic label labeling method of the present invention, and as shown in fig. 1, the present embodiment provides an automatic label labeling method, including:
word grouping step S10: establishing a plurality of independent word banks based on the business corpus, and performing word grouping aiming at the words of each word bank;
label and label rule defining step S20: based on each word stock, after defining tag grouping according to the service elements, selecting or self-defining words from the word stock as tags, and defining tag rules based on an automatic tag labeling matching mode;
label labeling step S30: and matching the words in the text to be labeled with the words in the word groups based on the label rule, labeling the words in the text to be labeled which meet the matching condition, and storing the label automatic labeling result into a label result table to finish the automatic labeling of the label.
In some embodiments of the present invention, the automatic label labeling method further includes:
and a label labeling result evaluating step: after the automatic label marking result is evaluated, obtaining a label evaluation result, and evaluating and calculating the result of the evaluation task based on the label evaluation result;
and (3) optimizing the label rule: optimizing the label rule based on the output of the result evaluation calculation;
recalculating: and based on the optimized label rule, recalculation is initiated, and automatic labeling of the system label is carried out again.
In some embodiments of the present invention, the automatic label labeling method further includes:
and (3) cold starting of a label system: calculating word frequency and importance degree by word segmentation of the new service corpus, extracting keywords, and supplementing the keywords into a word bank and a label; and clustering the new service corpora, extracting a new label rule based on a clustering result, and supplementing the new label rule.
In some embodiments of the present invention, the step of evaluating the label labeling result includes:
and an evaluation result calculation step: and evaluating and calculating the evaluation task based on the evaluated label marking result to obtain the coverage rate and the accuracy rate of the automatic marking of the label.
In a second aspect, an embodiment of the present application provides a label system establishing system 100, which employs the above-mentioned label automatic labeling method, and fig. 2 is a schematic diagram of an automatic label labeling system according to the present invention, and as shown in fig. 2, the system includes:
the word grouping module 10: establishing a plurality of independent word banks based on the business corpus, and performing word grouping aiming at the words of each word bank;
tag and tag rule definition module 20: based on each word stock, after defining tag grouping according to the service elements, selecting or self-defining words from the word stock as tags, and defining tag rules based on an automatic tag labeling matching mode;
the automatic label labeling module 30: and matching the words in the text to be labeled with the words in the word groups based on the label rule, labeling the words in the text to be labeled which meet the matching condition, and storing the label automatic labeling result into a label result table to finish the automatic labeling of the label.
In some embodiments of the present invention, the automatic label labeling system further includes:
a label labeling result evaluating module: after the automatic label marking result is evaluated, obtaining a label evaluation result, and evaluating and calculating the result of the evaluation task based on the label evaluation result;
a label rule optimization module: optimizing the label rule based on the output of the result evaluation calculation;
a recalculation module: and based on the optimized label rule, recalculation is initiated, and automatic labeling of the system label is carried out again.
In some embodiments of the present invention, the automatic label labeling system further includes:
the label system cold start module: calculating word frequency and importance degree by word segmentation of the new service corpus, extracting keywords, and supplementing the keywords into a word bank and a label; and clustering the new service corpora, extracting a new label rule based on a clustering result, and supplementing the new label rule.
In some embodiments of the present invention, in the automatic label labeling system, the label labeling result evaluating module includes:
an evaluation result calculation module: and evaluating and calculating the evaluation task based on the evaluated label marking result to obtain the coverage rate and the accuracy rate of the automatic marking of the label.
The following detailed description of specific embodiments of the invention refers to the accompanying drawings in which:
the label automatic labeling method provided by the invention constructs a rule for performing structured processing on the text, and evaluates the structured text processed by the rule, thereby further optimizing the rule. And automatically marking by using an algorithm model, and adding a corresponding label to the text according to a set rule. The specific rule is defined in a rule module, and according to the role of the text corresponding to the voice or corpus source and a preset regular expression, certain keywords in the text are matched and identified, and corresponding labels are labeled.
The system of the invention constructs a complete word stock, label and rule system, designs a whole set of label evaluation flow, and continuously trains the label rule to reach the service analysis available level.
The functional modules of the system are divided into: the system comprises a word bank module, a label module, a rule module, an evaluation module, a new word module, a rule module and the like.
(1) A word bank module:
the method is divided into a general word bank, an industry word bank, an enterprise word bank and the like, and the word bank level is refined layer by layer. The general word stock contains keywords which are not related to industry division, such as: you good, goodbye, what is needed, etc. The industry lexicon houses keywords under fixed industries, such as the automotive industry: vehicle model, tire, etc. And the enterprise word bank stores the keywords in the business scene of the client according to the customization of the client enterprise. The three word banks are separated so that the system can complement the industry and the general word bank of the specific enterprise when serving the specific enterprise. Making the system more robust.
The three word banks are stored separately from each other in the storage, so that the universal and industrial word banks are not polluted. The three word banks have similar structures, and fig. 3 is a schematic diagram of the word bank structure according to the embodiment of the present invention; as shown in fig. 3, the thesaurus structure includes: word groupings and words.
(2) A label module:
the system defines tag grouping according to enterprise business elements, and then selects proper words from the word stock or self-defines one word as a tag. Fig. 4 is a schematic structural diagram of a tag according to an embodiment of the present invention, and as shown in fig. 4, the structure of the tag is: and (3) label grouping: topic, theme, keyword; and (4) a label.
(3) A rule module:
the rule represents a labeling matching mode, a text is matched according to a role and a regular expression, a label set by the rule is labeled on the matched text, fig. 5 is a schematic diagram of a rule structure of a specific embodiment of the present invention, and as shown in fig. 5, the rule structure is as follows: canonical matches, roles, topics, keywords, and weights.
(4) An evaluation module:
the module sets a set of label evaluation flow, the whole flow takes a task as a unit, and each stage of the task is controlled by a state. And manually modifying or confirming the marking result of the system, comparing the marking result with the marking result of the original system to obtain the coverage rate and the accuracy rate matched with the label, modifying the problematic label rule, then carrying out marking calculation again, and continuously optimizing the coverage rate and the accuracy rate so as to achieve the training of the label rule.
Fig. 6 is a schematic diagram of a tag evaluation process according to an embodiment of the present invention, and as shown in fig. 6, entity relationships (only core attributes are shown) designed in the whole process.
The basic corpora in the system are shared among tasks, namely, the processing of the same corpora in the task A is synchronized with the processing of the same corpora in the task B. The design has the advantages that only one corpus is ensured, the storage is saved, and meanwhile, the situation that an evaluating person repeatedly evaluates the corpus is avoided.
The accuracy and the coverage rate of the task are obtained based on the comparison of the system result and the manual result. The system result is the result of matching the label rule under the current system, and the artificial result is the final result after the evaluation is modified. The structure of the two is the same with the quantity corresponding to the corpus, and whether the marking of the corpus system is correct can be obtained by comparing the label values in the same label group.
The coverage calculation formula is as follows: the system marks the number of linguistic data/manually evaluates the number of linguistic data with marks.
The accuracy calculation formula is as follows: the system is labeled and manually evaluated to obtain the correct corpus number/the system is labeled.
The key points of the whole process are as follows:
pulling service data: and pulling out the business corpus from the data source, ensuring the uniqueness of the corpus to be deduplicated during storage, and then maintaining the relationship between the evaluation task and the corpus under the task.
Matching label rules: and calling a word segmentation script, segmenting the text, then reversely arranging the rules according to the weight, sequentially carrying out regular matching on the text, and storing the matching result into a system labeling result table.
Manual evaluation: and manually evaluating the tag result of the corpus on the interface, and supporting the operations of modifying the tag, deleting the tag, adding the tag, directly passing through the operation and the like. And storing the label result of the corpus into a manual evaluation label result table in each operation.
Calculation/recalculation: and after the evaluation is finished, calculating the task to obtain the coverage rate and the accuracy rate. And after the label rule is optimized, recalculation can be initiated, the system labeling can be performed again, and at the moment, the last manual evaluation result is still in place, and the updated coverage rate and accuracy rate can be obtained by directly performing calculation.
New words module/new rules module:
for a brand-new service scene 'cold start' stage, the system itself has no thesaurus, label and rule for the industry. A batch of keywords and rules need to be extracted from the current corpus as the basis for initial training.
And (3) finding a new word: word frequency and importance are calculated by the script for word segmentation, and then keywords are extracted. And manually confirming whether the keywords need to be supplemented into the enterprise word stock and the labels.
And discovering a new rule: clustering corpora through scripts, and clustering the corpora together according to the similarity between sentences so as to extract each class of label rules. The rules are manually confirmed whether or not to be supplemented into the label rules.
In a third aspect, an embodiment of the present application provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the computer program, the automatic label labeling method according to the first aspect is implemented.
In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the automatic label labeling method according to the first aspect.
In addition, the automatic label labeling method of the embodiment of the present application described in conjunction with fig. 1 may be implemented by a computer device. Fig. 7 is a hardware structure diagram of a computer device according to an embodiment of the present application.
The computer device may comprise a processor 81 and a memory 82 in which computer program instructions are stored.
Specifically, the processor 81 may include a Central Processing Unit (CPU), or A Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present Application.
Memory 82 may include, among other things, mass storage for data or instructions. By way of example, and not limitation, memory 82 may include a Hard Disk Drive (Hard Disk Drive, abbreviated to HDD), a floppy Disk Drive, a Solid State Drive (SSD), flash memory, an optical Disk, a magneto-optical Disk, tape, or a Universal Serial Bus (USB) Drive or a combination of two or more of these. Memory 82 may include removable or non-removable (or fixed) media, where appropriate. The memory 82 may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory 82 is a Non-Volatile (Non-Volatile) memory. In particular embodiments, Memory 82 includes Read-Only Memory (ROM) and Random Access Memory (RAM). The ROM may be mask-programmed ROM, Programmable ROM (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), Electrically rewritable ROM (EAROM), or FLASH Memory (FLASH), or a combination of two or more of these, where appropriate. The RAM may be a Static Random-Access Memory (SRAM) or a Dynamic Random-Access Memory (DRAM), where the DRAM may be a Fast Page Mode Dynamic Random-Access Memory (FPMDRAM), an Extended data output Dynamic Random-Access Memory (EDODRAM), a Synchronous Dynamic Random-Access Memory (SDRAM), and the like.
The memory 82 may be used to store or cache various data files for processing and/or communication use, as well as possible computer program instructions executed by the processor 81.
The processor 81 reads and executes the computer program instructions stored in the memory 82 to implement any one of the automatic labeling methods in the above embodiments.
In some of these embodiments, the computer device may also include a communication interface 83 and a bus 80. As shown in fig. 7, the processor 81, the memory 82, and the communication interface 83 are connected via the bus 80 to complete communication therebetween.
The communication interface 83 is used for implementing communication between modules, devices, units and/or equipment in the embodiment of the present application. The communication port 83 may also be implemented with other components such as: the data communication is carried out among external equipment, image/data acquisition equipment, a database, external storage, an image/data processing workstation and the like.
Bus 80 includes hardware, software, or both to couple the components of the computer device to each other. Bus 80 includes, but is not limited to, at least one of the following: data Bus (Data Bus), Address Bus (Address Bus), Control Bus (Control Bus), Expansion Bus (Expansion Bus), and Local Bus (Local Bus). By way of example, and not limitation, Bus 80 may include an Accelerated Graphics Port (AGP) or other Graphics Bus, an Enhanced Industry Standard Architecture (EISA) Bus, a Front-Side Bus (FSB), a Hyper Transport (HT) Interconnect, an ISA (ISA) Bus, an InfiniBand (InfiniBand) Interconnect, a Low Pin Count (LPC) Bus, a memory Bus, a microchannel Architecture (MCA) Bus, a PCI (Peripheral Component Interconnect) Bus, a PCI-Express (PCI-X) Bus, a Serial Advanced Technology Attachment (SATA) Bus, a Video Electronics Bus (audio Electronics Association), abbreviated VLB) bus or other suitable bus or a combination of two or more of these. Bus 80 may include one or more buses, where appropriate. Although specific buses are described and shown in the embodiments of the application, any suitable buses or interconnects are contemplated by the application.
The computer device can be based on a label system, so as to realize the automatic label labeling method described in conjunction with fig. 1.
In addition, in combination with the automatic label labeling method in the foregoing embodiments, embodiments of the present application may provide a computer-readable storage medium to implement. The computer readable storage medium having stored thereon computer program instructions; the computer program instructions, when executed by a processor, implement any one of the automatic labeling methods in the above embodiments.
Compared with the prior art, the method for training and optimizing the label rule provided by the invention comprises the following steps: corresponding word banks, labels and rule systems are maintained according to the industry, automatic marking is carried out through an algorithm model, manual evaluation is provided, and marking accuracy is improved. And for a brand-new service scene cold start stage, a word bank and a label rule which are based on discovery function supplement are provided.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. An automatic label labeling method is characterized by comprising the following steps:
a word grouping step: establishing a plurality of independent word banks based on the business corpus, and performing word grouping on words in each word bank;
label and label rule defining step: based on each word stock, after defining tag grouping according to business elements, selecting or self-defining words from the word stock as tags, and defining tag rules based on an automatic tag labeling matching mode;
labeling the label: and matching the words in the text to be labeled with the words in the word group based on the label rule, labeling the labels of the words in the text to be labeled meeting the matching condition, and storing the label automatic labeling result into a label result table to finish the automatic labeling of the labels.
2. The automatic labeling method of claim 1, further comprising:
and a label labeling result evaluating step: after the automatic label marking result is evaluated, obtaining a label evaluation result, and evaluating and calculating the result of the evaluation task based on the label evaluation result;
and (3) optimizing the label rule: optimizing the label rule based on the output of the result evaluation calculation;
recalculating: and based on the optimized label rule, recalculation is initiated, and automatic labeling of the system label is carried out again.
3. The automatic labeling method of claim 1, further comprising:
and (3) cold starting of a label system: calculating word frequency and importance degree by word segmentation of the new service corpus, extracting keywords, and supplementing the keywords to the word bank and the label; and clustering the new service corpus, extracting a new label rule based on a clustering result, and supplementing the new label rule.
4. The automatic label labeling method according to claim 2, wherein the label labeling result evaluating step comprises:
and an evaluation result calculation step: and evaluating and calculating the evaluation task based on the evaluated label marking result to obtain the coverage rate and the accuracy rate of the automatic marking of the label.
5. An automatic label labeling system adopting the automatic label labeling method according to any one of claims 1 to 4, comprising:
a word grouping module: establishing a plurality of independent word banks based on the business corpus, and performing word grouping on words in each word bank;
the label and label rule definition module: based on each word stock, after defining tag grouping according to business elements, selecting or self-defining words from the word stock as tags, and defining tag rules based on an automatic tag labeling matching mode;
a label labeling module: and matching the words in the text to be labeled with the words in the word group based on the label rule, labeling the labels of the words in the text to be labeled meeting the matching condition, and storing the label automatic labeling result into a label result table to finish the automatic labeling of the labels.
6. The automatic labeling system of claim 5, further comprising:
a label labeling result evaluating module: after the automatic label marking result is evaluated, obtaining a label evaluation result, and evaluating and calculating the result of the evaluation task based on the label evaluation result;
a label rule optimization module: optimizing the label rule based on the output of the result evaluation calculation;
a recalculation module: and based on the optimized label rule, recalculation is initiated, and automatic labeling of the system label is carried out again.
7. The automatic labeling system of claim 5, further comprising:
the label system cold start module: calculating word frequency and importance degree by word segmentation of the new service corpus, extracting keywords, and supplementing the keywords to the word bank and the label; and clustering the new service corpus, extracting a new label rule based on a clustering result, and supplementing the new label rule.
8. The automatic labeling system of claim 6, wherein the label labeling result evaluating module comprises:
an evaluation result calculation module: and evaluating and calculating the evaluation task based on the evaluated label marking result to obtain the coverage rate and the accuracy rate of the automatic marking of the label.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the automatic label labeling method of any one of claims 1 to 4 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a method for automatic labeling of labels according to any one of claims 1 to 4.
CN202111240212.3A 2021-10-25 2021-10-25 Automatic label labeling method, system, equipment and storage medium Pending CN113961725A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111240212.3A CN113961725A (en) 2021-10-25 2021-10-25 Automatic label labeling method, system, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111240212.3A CN113961725A (en) 2021-10-25 2021-10-25 Automatic label labeling method, system, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113961725A true CN113961725A (en) 2022-01-21

Family

ID=79466695

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111240212.3A Pending CN113961725A (en) 2021-10-25 2021-10-25 Automatic label labeling method, system, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113961725A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116860979A (en) * 2023-09-04 2023-10-10 上海柯林布瑞信息技术有限公司 Medical text labeling method and device based on label knowledge base

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107423264A (en) * 2017-07-10 2017-12-01 广东华联建设投资管理股份有限公司 A kind of engineering material borrowing-word extracting method
CN108363725A (en) * 2018-01-08 2018-08-03 浙江大学 A kind of method of the extraction of user comment viewpoint and the generation of viewpoint label
CN109858018A (en) * 2018-12-25 2019-06-07 中国科学院信息工程研究所 A kind of entity recognition method and system towards threat information
CN109857957A (en) * 2019-01-29 2019-06-07 掌阅科技股份有限公司 Establish method, electronic equipment and the computer storage medium of tag library
CN109918662A (en) * 2019-03-04 2019-06-21 腾讯科技(深圳)有限公司 A kind of label of e-sourcing determines method, apparatus and readable medium
US20190220486A1 (en) * 2017-12-08 2019-07-18 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for mining general tag, server, and medium
CN110147499A (en) * 2019-05-21 2019-08-20 智者四海(北京)技术有限公司 Label method, recommended method and recording medium
CN110222709A (en) * 2019-04-29 2019-09-10 上海暖哇科技有限公司 A kind of multi-tag intelligence marking method and system
CN110825876A (en) * 2019-11-07 2020-02-21 上海德拓信息技术股份有限公司 Movie comment viewpoint emotion tendency analysis method
CN112445897A (en) * 2021-01-28 2021-03-05 京华信息科技股份有限公司 Method, system, device and storage medium for large-scale classification and labeling of text data

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107423264A (en) * 2017-07-10 2017-12-01 广东华联建设投资管理股份有限公司 A kind of engineering material borrowing-word extracting method
US20190220486A1 (en) * 2017-12-08 2019-07-18 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for mining general tag, server, and medium
CN108363725A (en) * 2018-01-08 2018-08-03 浙江大学 A kind of method of the extraction of user comment viewpoint and the generation of viewpoint label
CN109858018A (en) * 2018-12-25 2019-06-07 中国科学院信息工程研究所 A kind of entity recognition method and system towards threat information
CN109857957A (en) * 2019-01-29 2019-06-07 掌阅科技股份有限公司 Establish method, electronic equipment and the computer storage medium of tag library
CN109918662A (en) * 2019-03-04 2019-06-21 腾讯科技(深圳)有限公司 A kind of label of e-sourcing determines method, apparatus and readable medium
CN110222709A (en) * 2019-04-29 2019-09-10 上海暖哇科技有限公司 A kind of multi-tag intelligence marking method and system
CN110147499A (en) * 2019-05-21 2019-08-20 智者四海(北京)技术有限公司 Label method, recommended method and recording medium
CN110825876A (en) * 2019-11-07 2020-02-21 上海德拓信息技术股份有限公司 Movie comment viewpoint emotion tendency analysis method
CN112445897A (en) * 2021-01-28 2021-03-05 京华信息科技股份有限公司 Method, system, device and storage medium for large-scale classification and labeling of text data

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116860979A (en) * 2023-09-04 2023-10-10 上海柯林布瑞信息技术有限公司 Medical text labeling method and device based on label knowledge base
CN116860979B (en) * 2023-09-04 2023-12-08 上海柯林布瑞信息技术有限公司 Medical text labeling method and device based on label knowledge base

Similar Documents

Publication Publication Date Title
CN108804641B (en) Text similarity calculation method, device, equipment and storage medium
CN104750705B (en) Information replying method and device
US20190102373A1 (en) Model-based automatic correction of typographical errors
EP2581843B1 (en) Bigram Suggestions
CN106033416A (en) A string processing method and device
CN110209790B (en) Question-answer matching method and device
CN113836925B (en) Training method and device for pre-training language model, electronic equipment and storage medium
CN107291775B (en) Method and device for generating repairing linguistic data of error sample
CN105608113B (en) Judge the method and device of POI data in text
CN112256845A (en) Intention recognition method, device, electronic equipment and computer readable storage medium
CN110990563A (en) Artificial intelligence-based traditional culture material library construction method and system
CN112380348B (en) Metadata processing method, apparatus, electronic device and computer readable storage medium
CN112183102A (en) Named entity identification method based on attention mechanism and graph attention network
CN113704623A (en) Data recommendation method, device, equipment and storage medium
CN112784572A (en) Marketing scene conversational analysis method and system
CN113961725A (en) Automatic label labeling method, system, equipment and storage medium
CN113779364A (en) Searching method based on label extraction and related equipment thereof
CN111339287B (en) Abstract generation method and device
CN110188274B (en) Search error correction method and device
CN109947947B (en) Text classification method and device and computer readable storage medium
US10467291B2 (en) Method and system for providing query suggestions
CN107203512B (en) Method for extracting key elements from natural language input of user
CN114298028B (en) BIM semantic disambiguation method and system
CN112650837B (en) Text quality control method and system combining classification algorithm and unsupervised algorithm
CN111967257B (en) Word segmentation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination