CN117668625A - Tag set generation method and device and electronic equipment - Google Patents

Tag set generation method and device and electronic equipment Download PDF

Info

Publication number
CN117668625A
CN117668625A CN202211008886.5A CN202211008886A CN117668625A CN 117668625 A CN117668625 A CN 117668625A CN 202211008886 A CN202211008886 A CN 202211008886A CN 117668625 A CN117668625 A CN 117668625A
Authority
CN
China
Prior art keywords
label
definition
service data
preset
tag
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211008886.5A
Other languages
Chinese (zh)
Inventor
吕望
范聪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202211008886.5A priority Critical patent/CN117668625A/en
Publication of CN117668625A publication Critical patent/CN117668625A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application provides a tag set generation method, a tag set generation device and electronic equipment, and relates to the technical field of computers, wherein the tag set generation method comprises the following steps: acquiring a plurality of first service data in a specific field, and presetting label definition sentences corresponding to all labels in a label set, wherein the label definition sentences are used for representing the definition of the corresponding labels; for each first service data, labeling the first service data based on label definition sentences corresponding to all labels in a preset label set to obtain labeling results of the first service data; and determining the quality of each label in the preset label set according to labeling results of the plurality of first service data, and updating the preset label set according to the quality of each label in the preset label set to obtain a first target label set in a specific field. Therefore, a high-quality label set in a specific field can be generated, and a foundation is laid for accurately explaining or classifying the object.

Description

Tag set generation method and device and electronic equipment
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for generating a tag set, and an electronic device.
Background
In many business fields, objects such as products, users, text, etc. need to be described or classified by tags. For example, in the field of mobile phones, mobile phones are usually described or classified by labels such as "good resolution", "high pixel", "large memory", etc.; in the pharmaceutical field, drugs are usually described or classified by labels such as "good efficacy", "poor efficacy", "high price", "strong dependence", and the like.
In different business fields, it is first necessary to know which tags are included to determine the tags corresponding to the objects from the tags, so as to accurately describe or classify the objects. Knowing which tags are within a business segment requires mining the tag sets within that business segment. The quality of the tag set greatly affects the accuracy of the description or classification of the object.
In the related art, a business expert is generally relied on to perform summary analysis on a large amount of business data in a specific field to mine a label set in the field, and the method is limited by experience, manpower and the like of the business expert, so that certain low-quality labels possibly exist in the label set obtained by mining, and the quality of the label set obtained by mining is low.
Disclosure of Invention
The present application aims to solve, at least to some extent, one of the technical problems in the related art.
The embodiment of the application provides a method and a device for generating a label set and electronic equipment, and aims to solve the technical problem that the quality of the label set obtained by label mining in the related technology is low.
An embodiment of a first aspect of the present application provides a method for generating a tag set, including: acquiring a plurality of first service data in a specific field and presetting label definition sentences corresponding to all labels in a label set, wherein the label definition sentences are used for representing the definition of the corresponding labels; labeling the first service data based on label definition sentences corresponding to all labels in the preset label set for each first service data to obtain labeling results of the first service data; determining the quality of each label in the preset label set according to labeling results of a plurality of first service data, and updating the preset label set according to the quality of each label in the preset label set to obtain a first target label set in the specific field.
An embodiment of a second aspect of the present application provides a tag set generating device, including: the first acquisition module is used for acquiring a plurality of first service data in a specific field and presetting label definition sentences corresponding to all labels in a label set, wherein the label definition sentences are used for representing definitions of the corresponding labels; the first labeling module is used for labeling the first service data based on label definition sentences corresponding to the labels in the preset label set for each first service data so as to obtain labeling results of the first service data; the first updating module is used for determining the quality of each label in the preset label set according to labeling results of a plurality of first service data, and updating the preset label set according to the quality of each label in the preset label set to obtain a first target label set in the specific field.
An embodiment of a third aspect of the present application provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of generating a set of labels as set forth in the embodiments of the first aspect of the present application.
An embodiment of a fourth aspect of the present application proposes a non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute a method for generating a set of labels as proposed by an embodiment of the first aspect of the present application.
An embodiment of a fifth aspect of the present application proposes a computer program product comprising a computer program which, when executed by a processor, implements a method of generating a set of labels as proposed by an embodiment of the first aspect of the present application.
One embodiment of the above invention has the following advantages or benefits:
the method comprises the steps of obtaining a plurality of first service data in a specific field, marking each first service data based on label definition sentences corresponding to each label in a preset label set, determining the quality of each label in the preset label set according to marking results of the plurality of first key service data, and updating the preset label set according to the quality of each label to obtain a first target label set in the specific field, so that a high-quality label set in the specific field can be generated, and a foundation is laid for accurately explaining or classifying objects.
Additional aspects and advantages of the application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the application.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:
fig. 1 is a flowchart of a method for generating a tag set according to an embodiment of the present application;
fig. 2 is a flowchart of a method for generating a tag set according to a second embodiment of the present application;
fig. 3 is a flowchart of a method for generating a tag set according to a third embodiment of the present application;
fig. 4 is a schematic structural diagram of a tag set generating device according to a fourth embodiment of the present application;
fig. 5 is a schematic structural diagram of a tag set generating device according to a fifth embodiment of the present application;
fig. 6 illustrates a block diagram of an exemplary electronic device suitable for use in implementing embodiments of the present application.
Detailed Description
Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are exemplary and intended for the purpose of explaining the present application and are not to be construed as limiting the present application.
It should be noted that, in the technical solution of the present disclosure, the acquisition, storage, application, etc. of the related personal information of the user all conform to the rules of the related laws and regulations, and do not violate the popular regulations of the public order.
Aiming at the technical problem that the quality of a label set obtained by mining in the related technology is low, the embodiment of the application provides a label set generation method, a device, electronic equipment, a storage medium and a computer program product. The method comprises the following steps: acquiring a plurality of first service data in a specific field, and presetting label definition sentences corresponding to all labels in a label set, wherein the label definition sentences are used for representing the definition of the corresponding labels; for each first service data, labeling the first service data based on label definition sentences corresponding to all labels in a preset label set to obtain labeling results of the first service data; and determining the quality of each label in the preset label set according to labeling results of the plurality of first service data, and updating the preset label set according to the quality of each label in the preset label set to obtain a first target label set in a specific field. Therefore, a high-quality label set in a specific field can be generated, and a foundation is laid for accurately explaining or classifying the object.
Methods, apparatuses, electronic devices, storage media, and computer program products for generating tag sets according to embodiments of the present application are described below with reference to the accompanying drawings.
The method for generating the tag set provided in the embodiment of the present application is first described below. The method for generating a tag set according to the embodiment of the present application is performed by a tag set generating device, and the tag set generating device is hereinafter referred to as a generating device. The generating device may be an electronic device or may be configured in the electronic device to generate a high-quality tag set in a specific field.
The electronic device may be a personal computer (Personal Computer, abbreviated as PC), a cloud device, a mobile device, a server, etc., and the mobile device may be any hardware device such as a mobile phone, a tablet computer, a personal digital assistant, a wearable device, a vehicle-mounted device, etc., which is not limited in this application.
Fig. 1 is a flowchart of a method for generating a tag set according to an embodiment of the present application. As shown in fig. 1, the method of generating the tag set may include the following steps 101-103.
Step 101, acquiring a plurality of first service data in a specific field and presetting a label definition statement corresponding to each label in a label set, wherein the label definition statement is used for representing the definition of the corresponding label.
The specific field may be any business field such as a mobile phone category field, a computer category field, a medical field, a news field, an academic field, an electronic commerce field, etc., which is not limited in this application.
The service data is data related to the service, which may be content data, index data, or the like, and is not limited in this application. Wherein, the content data is text data, such as an article or a short message. Index data is data in an index form. In the embodiment of the present application, for convenience of distinction, service data in a specific area is referred to as first service data.
For example, taking first service data as content data in the news field as an example, one first service data may include one news article; taking the first service data as the content data in the medicine field as an example, one first service data can comprise portable medical service information, on-line diagnosis and treatment information and the like of a certain user; taking the first service data as index data in the e-commerce field as an example, one first service data may include a table, where the table includes data corresponding to each index such as business hours, commodity data, sales, and the like of the store.
The preset tag set comprises a plurality of tags, each tag can be customized by a user in advance, or can be a plurality of tags in a specific field which are maintained in advance, and the application is not limited to the tags.
The label definition statement is used for representing the definition of the corresponding label, can be pre-written and stored in a preset writing mode, and can be understood by the electronic equipment. For example, for the tag "smart girl", the corresponding tag definition statement may represent the definition of "smart girl", i.e. what conditions are met may be noted as "smart girl" tags.
Step 102, for each first service data, labeling the first service data based on a label definition statement corresponding to each label in a preset label set to obtain a labeling result of the first service data.
In an example embodiment, for each first service data, a label definition statement corresponding to each label in a preset label set may be used to determine whether the first service data meets the definition of the corresponding label, and if the first service data meets the definition of the corresponding label, the label is used to label the first service data, and if the first service data does not meet the definition of the corresponding label, the label is not used to label the first service data, so as to obtain a labeling result of the first service data.
Accordingly, each first service data may be labeled as 0, 1 or more labels.
For example, assuming that the preset tag set includes 100 tags, taking tag 1 and tag 2 therein as examples, for each first service data, it may be determined whether the first service data satisfies the definition of tag 1 based on the tag definition statement corresponding to tag 1, and whether the first service data satisfies the definition of tag 2 based on the tag definition statement corresponding to tag 2. Assuming that the first service data satisfies the definition of the tag 1 and does not satisfy the definition of the tag 2, the first service data may be marked with the tag 1.
Step 103, determining the quality of each label in the preset label set according to labeling results of the plurality of first service data, and updating the preset label set according to the quality of each label in the preset label set to obtain a first target label set in a specific field.
The quality of the label can indicate the value of the label, the value of the label with high quality is high, and the value of the label with low quality is low.
In an example embodiment, for each tag in the preset tag set, according to labeling results of a plurality of first service data, the coverage condition, the independent condition, the association degree with other tags, and the like of the tag in all the first service data may be counted, and then according to the coverage condition, the independent condition, the association degree with other tags, and the like of the tag in all the first service data, the quality of the tag may be determined, and then the tag with lower quality in the preset tag set may be deleted, or the definition of the tag with lower quality may be modified, and the like, so as to update the preset tag set, and obtain the first target tag set with high quality in a specific field.
For example, assuming that the total number of first service data is 1000, 10 of which are labeled as tag 1, it may be determined that the coverage of tag 1 is 10/1000. Because the coverage rate of the tag 1 is low, it can be determined that the quality of the tag 1 is low, and then the tag 1 in the preset tag set can be deleted to update the preset tag set.
It should be noted that, the process of updating the preset label set according to the quality of each label in the preset label set may be automatically performed by the generating device, for example, the generating device may automatically delete the label with lower quality according to the quality of each label in the preset label set; or, the generating device may be executed based on a user instruction, for example, the generating device may display the quality of each tag in the preset tag set, and the user selects which tags need to be deleted or defined to be modified according to the quality of each tag, and triggers an update instruction, so that the generating device may update the preset tag set based on the update instruction triggered by the user, to obtain the first target tag set in the specific field. The present application is not limited in this regard.
In an example embodiment, after the first target tag set in the specific domain is acquired, a single-level or multi-level tag system in the specific domain may be constructed based on the names of the tags in the first target tag set and the definitions of the tags. In a single-level label system, a parent-child relationship does not exist among all labels; in a multi-level tag hierarchy, each tag has a parent-child relationship. The label system classifies various labels required by enterprises and defines the label attribute at the same time, so that the labels are more conveniently managed, maintained, applied and evaluated.
Taking the hierarchy of the tag system as two layers as an example, specifically, when the two-hierarchy tag system of the specific field is constructed based on the first target tag set in the specific field, names of parent tags of a plurality of tags in the first target tag set can be obtained, respective definitions of the plurality of tags are fused to obtain a definition of the parent tag, and then the two-hierarchy tag system is constructed based on the names and the definitions of all the tags in the first target tag set and the names and the definitions of all the parent tags.
It should be noted that, in the embodiment of the present application, after the first target tag set in the specific domain is obtained, the first target tag set may also be used as a preset tag set, and steps 101 to 103 may be repeatedly executed to further update the first target tag set. Therefore, by repeatedly iterating the steps 101-103, the iterative updating of the first target label set can be realized, so that the first target label set in the specific field is more perfect, and the quality of the first target label set in the specific field is further improved.
In summary, according to the method for generating the label set provided by the embodiment of the application, by acquiring the plurality of first service data in the specific field, labeling each first service data based on the label definition statement corresponding to each label in the preset label set, determining the quality of each label in the preset label set according to the labeling result of the plurality of first service data, and updating the preset label set according to the quality of each label to obtain the first target label set in the specific field, so that the high-quality label set in the specific field can be generated, and a foundation is laid for accurately explaining or classifying the object.
In order to clearly explain how to label the first service data based on the label definition statement corresponding to each label in the preset label set for each first service data in the embodiment of the present application, a labeling result of the first service data is obtained, and a method for generating the label set in the embodiment of the present application is further described below with reference to fig. 2.
Fig. 2 is a flowchart of a method for generating a tag set according to a second embodiment of the present application. As shown in fig. 2, the method of generating a tag set may include the following steps 201-204.
Step 201, acquiring a plurality of first service data in a specific field, and presetting a label definition statement corresponding to each label in a label set, wherein the label definition statement is used for representing definition of the corresponding label.
The specific implementation process and principle of step 201 may refer to the description of the foregoing embodiments, which is not repeated herein.
Step 202, for each first service data, executing a label definition statement corresponding to a label in a preset label set for each label in the preset label set, so as to determine whether the first service data meets the definition of the label.
The label definition statement may be a semi-regularized statement, and the label is defined based on a semi-regular expression mode, that is, the label definition is represented by the semi-regular expression mode. In particular, the tag definition statement may use keywords, regular expressions, preset logical symbols, etc. in combination with the characteristics of regular expressions and query statements such as SQL (Structured Query Language ) database query statements to define tags.
The tag definition sentence adopts clauses, and common logical predicates of "and (sum)", "or)", "not)", brackets and the like are integrated to form a more complex sentence. Because logical predicates such as "and", "or", "not" and the like are used, each clause only needs to be a simple regular expression. In this embodiment, the symbols such as "and", "or", "not" and the like are referred to as preset logical symbols. Wherein the tag definition statement may include one or more clauses.
Taking the first service data as the content data as an example, the clause structure may be in the following form:
the regular expression M 'of #XXXXXXXX'
Wherein, "#" indicates that the service data macro defines, represents a service data, and "XXXXXX" is omitted in the sentence, and indicates an operator for determining whether the service data matches the regular expression M, such as rlike in SQL. The meaning of the clause is: the traffic data of regular expression M is satisfied.
For certain first service data, by executing a label definition statement corresponding to a certain label, whether the label definition statement is matched with the first service data or not can be judged, so that whether the first service data is matched with a definition corresponding to the label definition statement or not can be judged, namely whether the first service data meets the definition of the label or not is judged.
When the tag definition statement meets the expression form of the query condition of sparkSQL, the SQL engine of sparkSQL can be called to execute the tag definition statement, and the execution logic of the tag definition statement does not need to be studied. Specifically, the executing process of the tag definition statement may call the SQL engine of sparkSQL, call the Spark library of open source to determine whether the tag definition statement is true, if true, return true, otherwise return false. When a true is returned, whether the tag definition statement and the first service data are matched or not is indicated, namely the first service data meet the definition of the corresponding tag.
In an example embodiment, where the tag definition statement includes a clause, step 202 may be implemented by:
for each first service data, for each label in a preset label set, under the condition that a label definition statement corresponding to the label comprises a clause, replacing a service data macro definition included in the clause in the label definition statement corresponding to the label by using the first service data, and executing the replaced label definition statement to judge whether the first service data meets a regular expression in the clause; determining that the first business data meets the definition of the tag under the condition that the first business data meets the regular expression in the clause; in the event that the first business data does not satisfy the regular expression in the clause, it is determined that the first business data does not satisfy the definition of the tag.
In an example embodiment, where the tag definition statement includes multiple clauses, each clause may be connected by a preset logical symbol, and accordingly, step 202 may be implemented by:
for each first service data, under the condition that a label definition statement corresponding to a label comprises a plurality of clauses for each label in a preset label set, replacing service data macro definition included in each clause in the label definition statement corresponding to the label by using the first service data, and executing the replaced label definition statement to judge whether the first service data meets a regular expression in each clause and logic of a preset logic symbol connected with each clause; under the condition that the first service data meets the regular expression and logic in each clause, determining that the first service data meets the definition of the tag; in the event that the first business data does not satisfy the regular expression in the clause and/or does not satisfy the logic, it is determined that the first business data does not satisfy the definition of the tag.
The preset logical symbol may include logical predicates such as "and", "or", "not", and brackets.
For example, the tag definition statement may be of the form: clause 1and clause 2and clause 3, the tag defines the meaning of the sentence as: all data of clause 1, clause 2and clause 3 are satisfied at the same time. When certain first service data meets the meaning of the tag definition statement, the first service data is considered to meet the logic of the preset logic symbol connected with each clause.
Alternatively, the tag definition statement may be of the form: (clause 1or clause 2) and clause 3, the meaning of the tag definition statement is: clause 3 is satisfied and all data of at least one of clause 1and clause 2 is satisfied. When certain first service data meets the meaning of the tag definition statement, the first service data is considered to meet the logic of the preset logic symbol connected with each clause.
Where the tag definition statement includes one or more clauses, the regular expression in the clause may be any regular expression. In one possible implementation, the regular expression in the clause may include a plurality of keywords, where the keywords are separated by a preset separator, for example, the keywords may be in the form of "a|b|c|d", where "a", "b", "c", "d" represent keywords and "|" is a preset separator. The regular expression means that any one of a, b, c, d is contained in certain service data, namely the service data is considered to meet the regular expression.
Tag definition statements comprising regular expressions of the above-described form, such as #xxxxx 'a|b|c|d', are typically used to retrieve whether a service data contains a keyword and/or its synonyms, and by representing the definition of the tag by using regular expressions of this form, robustness of the tag definition can be ensured. In addition, with the regular expression in this form, these logics are only required to be combined through preset logics such as and, or, not and brackets. By using and, or, not and preset logical symbols such as brackets, the complexity of the regular expression can be greatly reduced, so that very complex semantic logic can be defined by the one tag definition statement.
Correspondingly, when the regular expression included in the sub-sentence in the tag definition sentence is in the form, executing the replaced tag definition sentence, and determining that the first service data meets the regular expression under the condition that the first service data contains at least one target keyword in a plurality of keywords separated by a preset separator, and determining that the first service data does not meet the regular expression under the condition that the first service data does not contain any keyword in the plurality of keywords separated by the preset separator.
It should be noted that, when the tag definition sentence includes multiple clauses, each clause may include a service data macro definition, or may include only one service data macro definition, which is not limited in this application.
For example, assuming that the tag 1 is "too large in pendant", the tag definition sentence corresponding to the tag 1 includes two clauses, namely:
hanging the pendant @ and XXXXXXXX 'too large @ with the dot @ large @'
The meaning of the tag definition sentence is business data including at least one of the keywords "pendant", and including at least one of the keywords "too large", "dotted large".
Then for a certain first service data "the pendant is too large", the first service data can be utilized to replace the service data macro definition "#" included in the clause in the tag definition statement corresponding to the tag 1, and the replaced tag definition statement is executed to determine whether the first service data meets the regular expression in each clause and the logic of the preset logic symbol connecting each clause, that is, the first service data "whether the pendant is too large" whether the regular expression 'pendant|pendant' in the clause is met and whether the regular expression 'is too large|has a large|large point' in the clause is met. Since the keyword "pendant" is included in "the pendant is too large" and "too large", it can be determined that the first service data satisfies the regular expression in each clause and the logic of the preset logical symbol connecting each clause, and further it can be determined that the first service data satisfies the definition of the tag 1.
It may be understood that some identical segments may appear in tag definition sentences corresponding to different tags, in an example embodiment, some segments (i.e., preset character strings in the embodiment of the present application) that often appear may be defined as macro definitions, and the macro definitions are directly used in the tag definition sentences to reference the preset character strings, i.e., a clause in the tag definition sentences may further include a second macro definition of the preset character string. In addition, the tag can be defined as a macro definition, and the tag is referred to in a tag definition statement through the macro definition of the tag, that is, a clause in the tag definition statement can also include a first macro definition of a preset tag.
In the case that the clause includes the first macro definition of the preset tag and/or the second macro definition of the preset character string, when executing the tag definition statement, the first macro definition and/or the second macro definition needs to be replaced by the content defined by the macro definition correspondingly. Accordingly, before step 202, the method may further include: replacing the first macro definition by using a label definition statement of a preset label corresponding to the first macro definition, and/or replacing the second macro definition by using a preset character string corresponding to the second macro definition to obtain a format definition statement corresponding to the label;
Accordingly, step 202 may include:
and executing a formatting definition statement corresponding to each tag in the preset tag set for each first service data so as to judge whether the first service data meets the definition of the tag.
The first macro definition of the preset label may be the following form: "& tag name &".
For example, taking the second macro definition of the preset string as an example, assume that the label definition statement corresponding to the label "resolution is bad" is:
resolution of # XXXXXX ' and # XXXXXX ' bad |not good|not good|bad|general|garbage '
The label definition statement corresponding to the label "poor heat dissipation" is:
heat dissipation of #XXXXXX 'and poor |not good |not too good |not good how |bad|not good|not very good|bad|general|garbage'
The label definition statement corresponding to the two labels contains a segment of "# XXXXXX 'bad|not good|not too good|not good|bad|general|garbage'", and the segment can be defined as a macro definition "bad" as a preset character string. The macro definition is used in the manner of "# macro definition name #".
Wherein, the "bad" macro defines the defined content, i.e. the preset string may be as follows: not good |not row |not too row|not so good|not very good|poor|general|garbage'. Accordingly, the label definition statement corresponding to the label "resolution is not good" may be: the label definition statement corresponding to the label "poor heat dissipation" is: heat dissipation' and # of # xxxxx #. Wherein # bad # is the second macro definition of the preset character string.
Alternatively, the "bad" macro definition defines the content, i.e., the preset string, as follows: "bad |not row |not too row|not so good|not too good|not very good|bad|general|garbage". Accordingly, the label definition statement corresponding to the label "resolution is not good" may be: the label definition statement corresponding to the label "poor heat dissipation" is: "# XXXXXX ' heat sink ' and #XXXXXX ' # not good #". Wherein # bad # is the second macro definition of the preset character string.
In this embodiment of the present application, taking the case of "resolution not good" of a tag as an example, before executing a tag definition statement corresponding to the tag for a certain first service data, the "# not good#" in the corresponding tag definition statement may be replaced with the content defined by the macro definition of "not good" to obtain a format definition statement corresponding to the tag "resolution not good", and then executing the format definition statement corresponding to the tag to determine whether the first service data satisfies the definition of "resolution not good" of the tag.
By using macro definition in the tag definition statement, the complexity of the tag definition statement can be greatly simplified, the pre-programmed semantic rules can be multiplexed, and the writing cost of the tag definition statement is greatly reduced. In addition, if a new synonym of 'bad' word is discovered later, only the content defined by the macro definition needs to be modified, and all label definition sentences corresponding to the labels defined by the macro definition can be modified, so that the label definition sentences do not need to be modified one by one for each label, and the modification of the label definition sentences can be realized at low cost and high efficiency.
In addition, in the embodiment of the application, the preset labeling model meeting the interface standard of the generating device can be registered and released into UDFs (User-Defined Functions ), and the UDFs are called in the tag definition statement, so that the call to the preset labeling model is realized. Accordingly, step 202 may be implemented by:
for each first service data, executing a call statement to call a corresponding preset labeling model according to each label in a preset label set when a label definition statement corresponding to the label comprises the call statement corresponding to the preset labeling model, and judging whether the first service data meets the definition of the label or not by using the preset labeling model.
The preset labeling model is a preset model for labeling the service data.
Wherein, the call statement may be in the form of: UDF method name (parameter) =certain value.
For example, assume that a classification model is registered as a UDF: gender_classification is used for judging the gender corresponding to a section of text, the classification model is a classification model, if the classification model returns 1 to represent the text corresponding to male and returns 0 to represent the text corresponding to female, the label definition statement corresponding to the label "male" can comprise the following call statement: geneder_classification ('#' = 1).
The generating device may invoke the corresponding classification model by executing the invoking statement, determine whether the first service data satisfies the definition of the tag "male" by using the classification model, determine that the first service data satisfies the definition of the tag "male" when the classification model returns to 1, and determine that the first service data does not satisfy the definition of the tag "male" when the classification model returns to 0.
Through the support of the preset labeling models, some preset labeling models can be applied to the label definition sentences corresponding to the labels, so that the label definition sentences have stronger expression capability and higher accuracy.
In addition, taking the first service data as index data as an example, clauses in the tag definition statement corresponding to the tag may be, for example, one of the following forms: index value < = index, index < = index value, index = index value.
For example, the tag definition statement corresponding to the tag "smart girl" may be in the form of:
15< = age and age < = 20and gender = 'female'.
By executing the tag definition statement, it can be determined whether the first service data satisfies the definition corresponding to "smart girl".
It should be noted that, in the above embodiment, the description of the tag definition statement in the form of #xxxxx 'regular expression M' is also applicable to the tag definition statement in the form herein, and will not be repeated herein.
In step 203, the first service data is marked by the label when the first service data satisfies the definition of the label.
In the embodiment of the present application, for each first service data, when the first service data meets the definition of a certain label, the label may be used to label the first service data.
Therefore, label definition sentences corresponding to the labels in the embodiment of the application can also be used for labeling the service data.
Step 204, determining the quality of each label in the preset label set according to the labeling results of the plurality of first service data, and updating the preset label set according to the quality of each label in the preset label set to obtain a first target label set in a specific field.
The specific implementation and principles of steps 204-204 may refer to the description of the foregoing embodiments, and are not repeated herein.
In summary, according to the method for generating the label set provided by the embodiment of the application, by acquiring the plurality of first service data in the specific field, labeling each first service data based on the label definition statement corresponding to each label in the preset label set, determining the quality of each label in the preset label set according to the labeling result of the plurality of first service data, and updating the preset label set according to the quality of each label to obtain the first target label set in the specific field, so that the high-quality label set in the specific field can be generated, and a foundation is laid for accurately explaining or classifying the object. In addition, the label definition statement corresponding to the label in the embodiment of the application can accurately describe the definition of the label in a formulated mode, avoids ambiguity in understanding the label, has strong expression capability, can be understood by electronic equipment, is simple and easy to understand, is easy to understand by people, and has strong applicability.
The method for generating the tag set provided in the embodiment of the present application is further described below with reference to fig. 3.
Fig. 3 is a flowchart of a method for generating a tag set according to a third embodiment of the present application. As shown in fig. 3, the method of generating a tag set may include the following steps 301-307.
Step 301, obtaining a plurality of first service data in a specific field, and presetting a label definition statement corresponding to each label in a label set, wherein the label definition statement is used for representing definition of the corresponding label.
Step 302, for each first service data, labeling the first service data based on the label definition statement corresponding to each label in the preset label set, so as to obtain a labeling result of the first service data.
The specific implementation process and principle of steps 301 to 302 may refer to the description of the foregoing embodiments, and will not be repeated herein.
Step 303, determining a quality value of each label in the preset label set under at least one preset quality index according to labeling results of the plurality of first service data.
The preset quality index may be set according to needs, and may include, for example, a coverage index indicating a coverage condition of the tag in all the first service data, an independence index indicating an independent condition of the tag in all the first service data, a relevance index indicating a relevance degree of the tag with other tags, and the like. Correspondingly, the quality value of the label under the coverage index is the coverage of the label; the quality value of the label under the independence index is the independence of the label; the quality value of the label under the association index is the association of the label.
The coverage rate of a certain tag a may be determined based on the number of first service data marked as the tag a in all the first service data, where the greater the number of first service data marked as the tag a in all the first service data, the higher the coverage rate of the tag a, the higher the quality, and thus the higher the value of the tag a.
The degree of independence of a certain tag a can be determined in the manner shown in the following formula (1):
DEP(A)=O(A)/V(A) (1)
where DEP (a) represents the degree of independence of tag a, O (a) represents the number of first traffic data labeled as tag a but not as any other tag, and V (a) represents the number of all first traffic data labeled as tag a.
The degree of association of a certain tag a with a tag B can be determined in the manner shown in the following formula (2):
R(A,B)=V(A,B)/(V(A)+V(B)-V(A,B)) (2)
wherein R (a, B) represents the degree of association of tag a with tag B, V (a) represents the number of all first service data labeled as tag a, V (B) represents the number of all first service data labeled as tag B, and V (a, B) represents the number of all first service data labeled as tag a and tag B simultaneously. As can be seen from the formula (2), when the same service data is marked as the label a and the label B at the same time, it indicates that there is a certain association between the label a and the label B, and if the service data mark as the label a and the label B at the same time has a higher ratio, it indicates that the association between the label a and the label B is tighter.
The degree of association of two tags shown in table 1 below can be obtained in the manner shown in the above formula (2).
Step 304, screening each label in the preset label set according to the quality value of each label under at least one preset index, and obtaining a first target label set according to the screened labels.
In an example embodiment, for the coverage rate index and the independence index, respective corresponding index thresholds may be preset, so that among the labels included in the preset label set, the labels with the coverage rate and the independence lower than the corresponding index thresholds may be deleted, and the first target keyword set after screening is obtained. The index threshold may be set arbitrarily as needed, which is not limited in the present application.
For the relevance index, because the definition of the label with higher relevance is relatively close, the corresponding index threshold value can be preset, so that one label of the two labels with the relevance higher than the corresponding index threshold value in the preset label set can be deleted, and the first target keyword set after screening is obtained.
For example, if the index threshold corresponding to the association index is 0.8, the labels "bad softness" and "bad softness" in the table may be only retained, and the labels "bad softness" may be deleted.
It will be appreciated that in some cases it may be necessary to combine the definitions of existing tags to create new tags. For example, it is sometimes desirable to merge some synonymous tags into one tag, it is sometimes desirable to create a parent tag for some tags, etc. In the embodiment of the present application, fusion processing may be performed on at least two tags according to a preset fusion manner, to obtain a fusion tag, or fusion processing may be performed on tag definition statements corresponding to at least two tags according to a preset fusion manner, to obtain a fusion tag definition statement corresponding to a fusion tag.
In an example embodiment, a user may designate at least two tags to be fused in a preset tag set, so that the generating device may determine at least two tags to be fused in the preset tag set based on a user instruction, further perform fusion processing on the at least two tags to be fused according to a preset fusion mode to obtain a fused tag, and perform fusion processing on tag definition sentences corresponding to the at least two tags according to the preset fusion mode to obtain a fused tag definition sentence corresponding to the fused tag. The generating device may add the fusion tag having the corresponding fusion tag definition statement to the preset tag set to update the preset tag set, and further use the updated preset tag set as the preset tag set in steps 302-304 to obtain the first target tag set in the specific domain.
Accordingly, before step 302, the method may further include:
according to a preset fusion mode, carrying out fusion processing on at least two labels to be fused in a preset label set to obtain fusion labels, wherein the at least two labels to be fused are determined based on user instructions; and carrying out fusion processing on the label definition sentences corresponding to the at least two labels according to a preset fusion mode to obtain fusion label definition sentences corresponding to the fusion labels. The preset fusion mode can also be determined based on a user instruction.
The preset fusion mode can comprise intersection fusion, union fusion, custom fusion and the like.
Intersection fusion refers to fusion labels obtained by fusion processing of at least two labels to be fused, and simultaneously meets the definition of at least two labels before fusion, for example, the definition of the fusion labels obtained by fusion processing of the labels 1, 2 and 3 according to an intersection fusion mode is (definition of the label 1) and (definition of the label 2) and (definition of the label 3), and simultaneously meets the definition of the labels 1, 2 and 3. For example, the fashionable person and the wonderful girl can be fused to obtain the fashionable girl.
The union fusion refers to that fusion labels obtained by fusion processing of at least two labels to be fused meet at least one of definitions of all labels, for example, the definition of the obtained fusion labels is (definition of label 1) or (definition of label 2) or (definition of label 3) according to the union fusion mode, and the definition of at least one of the labels 1, 2 and 3 is met. For example, the basic meanings of the labels such as poor rebound resilience, insufficient rebound resilience, too little elasticity, insufficient elasticity and the like are consistent, and the labels can be subjected to fusion treatment, such as retaining one label, so as to obtain the fusion label with insufficient elasticity.
The union fusion can be used for disambiguation of synonymous labels and can also be used for constructing a label parent-child hierarchy, for example, a user can select a batch of labels describing the same type of problems, so that the generating device can fuse the batch of labels according to a union fusion mode based on a user instruction, the obtained fusion label is used as a parent label of the batch of labels, and the original batch of labels are used as child labels, so that the parent-child hierarchy relation of the labels can be constructed. For example, labels such as labels of too thick, too thin, too long, too short and the like can be fused in a union fusion mode to generate a fusion label size problem, and the fusion label size problem is used as a parent label of the labels.
The user-defined fusion refers to that the definition of at least two labels to be fused is combined together through preset logical symbols such as an and, an or, a bracket and the like according to user-defined logic, so as to form the definition of the fused label. The user-defined fusion can enable a user to flexibly construct a new label based on the existing label, and personalized definition of the label is achieved.
In an example embodiment, the label definition sentences corresponding to at least two labels to be fused are connected according to preset logic characters corresponding to a preset fusion mode, so that the fusion label definition sentences corresponding to the fusion labels can be obtained quickly.
Taking a preset fusion mode as an example of intersection fusion, for example, a label definition statement corresponding to a label "fashionable person" is: the tag definition statement corresponding to the tag "young girl" is given by "# XXXXXX 'regular expression M1': the regular expression M2 'of # xxxxx' can use and connect the tag definition sentences corresponding to the two tags to obtain a fused tag definition sentence corresponding to the fused tag "fashion girl": the #XXXXXX 'regular expression M1' and the #XXXXXXXX 'regular expression M2'.
Taking a preset fusion mode as a user-defined fusion example, a user can specify a fusion statement, and the fusion statement can refer to a corresponding tag through macro definition of the tag. When the label definition sentences corresponding to at least two labels are fused, macro definitions of the labels in the fused sentences can be replaced by label definition sentences corresponding to the labels, so that fused label definition sentences corresponding to the fused labels are obtained. Wherein the macro definition of the tag may be of the form: "& tag name &".
For example, assume that the user-specified fusion statement is: (& tag 1& and & tag 2 &) or (& tag 3& or & tag 4 &), the meaning of the fusion statement is: simultaneously satisfying the definitions of tag 1 and tag 2, or simultaneously satisfying the definitions of tag 3 and tag 4. The generating device may replace the label 1 in the fusion sentence with the label definition sentence corresponding to the label 1, replace the label 2 with the label definition sentence corresponding to the label 2, replace the label 3 with the label definition sentence corresponding to the label 3, and replace the label 4 with the label definition sentence corresponding to the label 4 to obtain the label definition sentence corresponding to the fusion label.
In an example embodiment, the user may further designate at least two tags to be fused in the first target tag set, so that the generating device may determine at least two tags to be fused in the first target tag set based on the user instruction, and further perform fusion processing on the at least two tags to be fused in the first target tag set according to a preset fusion manner, so as to obtain a fused first target tag set. That is, after step 304, it may further include:
according to a preset fusion mode, carrying out fusion processing on at least two labels to be fused in the first target label set to obtain a fused first target label set, wherein the at least two labels to be fused are determined based on a user instruction.
By means of fusion processing of at least two tags to be fused in the first target tag set, the fused first target tag set is obtained, high-quality tags included in the first target tag set can be expanded, and then a tag system with richer tags in a specific field can be constructed according to the first target tag set and the fused first target tag set.
And step 305, acquiring a plurality of second service data in other fields.
The other fields may be any business fields different from the specific fields, such as a medical field, a news field, an academic field, an e-commerce field, and the like, and the application is not limited thereto.
The service data is data related to the service, which may be content data, index data, or the like, and is not limited in this application. In the embodiment of the present application, for convenience of distinction, service data in other fields is referred to as second service data.
And 306, labeling the second service data based on the label definition statement corresponding to each label in the first target label set for each second service data to obtain a labeling result of the second service data.
Step 307, determining the quality of each label in the first target label set according to the labeling results of the plurality of second service data, and updating the first target label set according to the quality of each label in the first target label set to obtain a second target label set in other fields.
The process of steps 306-307 is performed based on a plurality of second service data in other fields, which is similar to the process of steps 302-304 performed based on a plurality of first service data in a specific field, and will not be repeated here.
For example, assuming that the specific field is a mobile phone category field, the first target tag set of the specific field includes tags such as "poor screen resolution", "slow running speed", "system stuck", "large heat generation", "poor service attitude", and "slow shipping speed", and the tags may be applicable to a computer category field. In this embodiment of the present application, each second service data in the computer product field may be labeled based on the label definition statements corresponding to the labels in the foregoing manner in steps 305 to 307, so as to obtain a labeling result of the second service data, further determine, according to the labeling results of the plurality of second service data, the quality of each label in the first target label set, and update the first target label set according to the quality of each label in the first target label set, so as to obtain a second target label set in the computer product field. Therefore, by utilizing the similarity and the correlation among the label sets of each business field, the label set of the mobile phone class field can be applied to the generation of the label set of the computer class field, so that the label propagation among different fields is realized.
By multiplexing the existing label set in a certain field and using the label set in a new service field, the label set in the new service field can be quickly generated, so that a label system in the new service field can be quickly constructed. With the increase of the richness and depth of the research field, the tag system is gradually enriched and perfected, more tag systems are accumulated along with the accumulation of time, more and more perfected tags are arranged under each tag system, so that the tags meeting the quality requirements are more and more, and the efficiency of researching the tag system of the new service field is higher and higher.
Fig. 4 is a schematic structural diagram of a tag set generating apparatus according to a fourth embodiment of the present application.
As shown in fig. 4, the tag set generating apparatus 400 may include: a first acquisition module 410, a first annotation module 420, and a first update module 430.
The first obtaining module 410 is configured to obtain a plurality of first service data in a specific domain, and preset a tag definition statement corresponding to each tag in the tag set, where the tag definition statement is used to represent a definition of the corresponding tag;
the first labeling module 420 is configured to label, for each first service data, the first service data based on a label definition statement corresponding to each label in a preset label set, so as to obtain a labeling result of the first service data;
The first updating module 430 is configured to determine, according to labeling results of the plurality of first service data, a quality of each label in the preset label set, and update the preset label set according to the quality of each label in the preset label set, so as to obtain a first target label set in a specific domain.
Note that, the tag set generating apparatus 400 provided in the embodiment of the present application may execute the tag set generating method in the foregoing embodiment, where the tag set generating apparatus 400 may be an electronic device, or may be configured in an electronic device, so as to generate a high-quality tag set in a specific area.
The electronic device may be a PC, a cloud device, a mobile device, a server, or the like, and the mobile device may be any hardware device such as a mobile phone, a tablet computer, a personal digital assistant, a wearable device, or a vehicle-mounted device, which is not limited in this application.
It should be noted that the explanation in the foregoing embodiment of the tag set generating method is also applicable to the tag set generating apparatus in this embodiment, and will not be repeated here.
According to the label set generating device, through acquiring the plurality of first service data in the specific field and based on label definition sentences corresponding to the labels in the preset label set, labeling is carried out on the first service data, the quality of the labels in the preset label set is determined according to labeling results of the plurality of first service data, and then the preset label set is updated according to the quality of the labels, so that a first target label set in the specific field is obtained, a high-quality label set in the specific field can be generated, and a foundation is laid for accurately explaining or classifying objects.
Fig. 5 is a schematic structural diagram of a tag set generating apparatus according to a fifth embodiment of the present application.
As shown in fig. 5, the tag set generating apparatus 500 may include: specifically, the method comprises the following steps: a first acquisition module 510, a first annotation module 520, and a first update module 530. The first obtaining module 510, the first labeling module 520 and the first updating module 530 in fig. 5 have the same functions and structures as the first obtaining module 410, the first labeling module 420 and the first updating module 430 in fig. 4.
In one possible implementation manner of the embodiment of the present application, the first labeling module 520 includes:
the processing unit is used for executing label definition sentences corresponding to the labels in the preset label set aiming at the labels in the preset label set so as to judge whether the first service data meet the definition of the labels or not;
and the labeling unit is used for labeling the first service data by using the label under the condition that the first service data meets the definition of the label.
In another possible implementation manner of the embodiments of the present application, a processing unit includes:
the first processing subunit is configured to replace, when the tag definition sentence corresponding to the tag includes a clause, a service data macro definition included in the clause in the tag definition sentence corresponding to the tag with the first service data, and execute the replaced tag definition sentence to determine whether the first service data satisfies a regular expression in the clause;
A first determining subunit, configured to determine that the first service data satisfies the definition of the tag, in a case where the first service data satisfies the regular expression in the clause;
and the second determining subunit is used for determining that the first service data does not meet the definition of the tag in the case that the first service data does not meet the regular expression in the clause.
In another possible implementation manner of the embodiment of the present application, the processing unit further includes:
the second processing subunit is configured to replace, when the tag definition sentence corresponding to the tag includes a plurality of clauses, a service data macro definition included in each clause in the tag definition sentence corresponding to the tag by using the first service data, and execute the replaced tag definition sentence, so as to determine whether the first service data satisfies a regular expression in each clause and whether logic of a preset logic symbol connected with each clause is satisfied;
a third determining subunit, configured to determine that the first service data satisfies the definition of the tag if the first service data satisfies the regular expression and the logic in each clause;
and a fourth determining subunit, configured to determine that the first service data does not satisfy the definition of the tag, where the first service data does not satisfy the regular expression in the clause and/or does not satisfy the logic.
In another possible implementation manner of the embodiment of the present application, the regular expression includes a plurality of keywords, and each keyword is separated by a preset separator; the processing unit further comprises:
and a fifth determining subunit, configured to determine that the first service data satisfies the regular expression when the first service data includes at least one target keyword of the plurality of keywords separated by the preset separator.
In another possible implementation manner of the embodiment of the present application, the clause further includes a first macro definition of a preset tag and/or a second macro definition of a preset character string;
the first labeling module 520 further comprises:
the replacing unit is used for replacing the first macro definition by using a label definition statement of a preset label corresponding to the first macro definition and/or replacing the second macro definition by using a preset character string corresponding to the second macro definition to obtain a formatting definition statement corresponding to the label;
a processing unit for:
and executing a formatting definition statement corresponding to the tag to judge whether the first service data meets the definition of the tag.
In another possible implementation manner of the embodiments of the present application, a processing unit includes:
and the third processing subunit is used for executing the call statement to call the corresponding preset labeling model when the label definition statement corresponding to the label comprises the call statement corresponding to the preset labeling model, and judging whether the first service data meets the definition of the label or not by using the preset labeling model.
In another possible implementation manner of the embodiment of the present application, the first updating module 530 includes:
the determining unit is used for determining the quality value of each label in the preset label set under at least one preset quality index according to the labeling results of the plurality of first service data;
the screening unit is used for screening each label in the preset label set according to the quality value of each label under at least one preset index, and obtaining a first target label set according to the screened labels.
In another possible implementation manner of the embodiment of the present application, the tag set generating apparatus 500 further includes:
the first fusion module is used for carrying out fusion processing on at least two tags to be fused in a preset tag set according to a preset fusion mode to obtain fusion tags, wherein the at least two tags to be fused are determined based on user instructions;
and the second fusion module is used for carrying out fusion processing on the label definition sentences corresponding to the at least two labels according to a preset fusion mode to obtain fusion label definition sentences corresponding to the fusion labels.
In another possible implementation manner of the embodiment of the present application, the tag set generating apparatus 500 further includes:
And the third fusion module is used for carrying out fusion processing on at least two tags to be fused in the first target tag set according to a preset fusion mode so as to obtain a fused first target tag set, wherein the at least two tags to be fused are determined based on a user instruction.
In another possible implementation manner of the embodiment of the present application, the tag set generating apparatus 500 may further include:
a second obtaining module 540, configured to obtain a plurality of second service data in other fields;
the second labeling module 550 is configured to label, for each second service data, the second service data based on the label definition statement corresponding to each label in the first target label set, so as to obtain a labeling result of the second service data;
and the second updating module 560 is configured to determine the quality of each tag in the first target tag set according to labeling results of the plurality of second service data, and update the first target tag set according to the quality of each tag in the first target tag set, so as to obtain a second target tag set in other fields.
According to the label set generating device, through acquiring the plurality of first service data in the specific field and based on label definition sentences corresponding to the labels in the preset label set, labeling is carried out on the first service data, the quality of the labels in the preset label set is determined according to labeling results of the plurality of first service data, and then the preset label set is updated according to the quality of the labels, so that a first target label set in the specific field is obtained, a high-quality label set in the specific field can be generated, and a foundation is laid for accurately explaining or classifying objects.
In order to achieve the above embodiments, the present application further proposes an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of generating a set of labels as set forth in any one of the foregoing embodiments of the present application.
To achieve the above embodiments, the present application further proposes a non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the method for generating a tag set as proposed in any one of the foregoing embodiments of the present application.
To achieve the above embodiments, the present application also proposes a computer program product comprising a computer program which, when executed by a processor, implements a method of generating a set of labels as proposed in any of the previous embodiments of the present application.
Fig. 6 illustrates a schematic block diagram of an example electronic device 600 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 6, the electronic device 600 may include a computing unit 601 that may perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM) 602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data required for the operation of the device 600 may also be stored. The computing unit 601, ROM602, and RAM603 are connected to each other by a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
Various components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, mouse, etc.; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 601 performs the respective methods and processes described above, for example, a tag set generation method. For example, in some embodiments, the method of generating a set of tags may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM602 and/or the communication unit 609. When the computer program is loaded into the RAM603 and executed by the computing unit 601, one or more steps of the tag set generation method described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the method of generating the tag set in any other suitable way (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), the internet, and blockchain networks.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual Private Server" or simply "VPS") are overcome. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (24)

1. A method for generating a tag set, comprising:
acquiring a plurality of first service data in a specific field and presetting label definition sentences corresponding to all labels in a label set, wherein the label definition sentences are used for representing the definition of the corresponding labels;
labeling the first service data based on label definition sentences corresponding to all labels in the preset label set for each first service data to obtain labeling results of the first service data;
Determining the quality of each label in the preset label set according to labeling results of a plurality of first service data, and updating the preset label set according to the quality of each label in the preset label set to obtain a first target label set in the specific field.
2. The method of claim 1, wherein the labeling the first service data based on the label definition statement corresponding to each label in the preset label set to obtain a labeling result of the first service data includes:
executing a label definition statement corresponding to each label in a preset label set aiming at each label in the preset label set so as to judge whether the first service data meets the definition of the label or not;
and marking the first service data by using the label under the condition that the first service data meets the definition of the label.
3. The method of claim 2, wherein the executing the tag definition statement corresponding to the tag in the preset tag set to determine whether the first service data meets the definition of the tag comprises:
Under the condition that the label definition statement corresponding to the label comprises a clause, replacing business data macro definition included in the clause in the label definition statement corresponding to the label by utilizing the first business data, and executing the replaced label definition statement to judge whether the first business data meets the regular expression in the clause;
determining that the first business data meets the definition of the tag under the condition that the first business data meets the regular expression in the clause;
and in the case that the first service data does not meet the regular expression in the clause, determining that the first service data does not meet the definition of the tag.
4. A method according to claim 3, characterized in that the method further comprises:
when the tag definition sentence corresponding to the tag includes a plurality of clauses, replacing a business data macro definition included in each clause in the tag definition sentence corresponding to the tag by using the first business data, and executing the replaced tag definition sentence to judge whether the first business data meets a regular expression in each clause and logic of a preset logic symbol connected with each clause;
Determining that the first business data meets the definition of the tag under the condition that the first business data meets the regular expression and the logic in each clause;
and in the case that the first service data does not meet the regular expression in the clause and/or does not meet the logic, determining that the first service data does not meet the definition of the tag.
5. The method of claim 3 or 4, wherein the regular expression includes a plurality of keywords, each of the keywords being separated by a preset separator; the method further comprises the steps of:
and under the condition that the first business data comprises at least one target keyword in the plurality of keywords separated by the preset separator, determining that the first business data meets the regular expression.
6. The method according to claim 3 or 4, wherein the clause further comprises a first macro definition of a preset tag and/or a second macro definition of a preset string;
before executing the label definition statement corresponding to the label in the preset label set to determine whether the first service data meets the definition of the label, the method further includes:
Replacing the first macro definition by using a label definition statement of a preset label corresponding to the first macro definition, and/or replacing the second macro definition by using a preset character string corresponding to the second macro definition to obtain a formatting definition statement corresponding to the label;
executing the label definition statement corresponding to the label in the preset label set to determine whether the first service data meets the definition of the label, including:
and executing the formatting definition statement corresponding to the label to judge whether the first service data meets the definition of the label.
7. The method of claim 2, wherein the executing the tag definition statement corresponding to the tag in the preset tag set to determine whether the first service data meets the definition of the tag comprises:
and executing the call statement to call the corresponding preset labeling model under the condition that the label definition statement corresponding to the label comprises the call statement corresponding to the preset labeling model, and judging whether the first service data meets the definition of the label or not by using the preset labeling model.
8. The method according to any one of claims 1-4, wherein determining the quality of each of the labels in the preset label set according to labeling results of the plurality of first service data, and updating the preset label set according to the quality of each of the labels in the preset label set to obtain a first target label set in the specific domain includes:
Determining a quality value of each label in the preset label set under at least one preset quality index according to labeling results of a plurality of first service data;
and screening each label in the preset label set according to the quality value of each label under at least one preset index, and obtaining the first target label set according to the screened labels.
9. The method according to any one of claims 1-4, wherein before the labeling of the first service data based on the label definition statement corresponding to each label in the preset label set for each of the first service data to obtain the labeling result of the first service data, the method further includes:
according to a preset fusion mode, carrying out fusion processing on at least two tags to be fused in the preset tag set to obtain fusion tags, wherein the at least two tags to be fused are determined based on user instructions;
and carrying out fusion processing on the label definition sentences corresponding to the at least two labels according to the preset fusion mode to obtain the fusion label definition sentences corresponding to the fusion labels.
10. The method according to any one of claims 1-4, further comprising, after said updating said preset set of labels to obtain a first set of target labels within said specific area based on the quality of each of said labels in said preset set of labels:
And carrying out fusion processing on at least two tags to be fused in the first target tag set according to a preset fusion mode to obtain a fused first target tag set, wherein the at least two tags to be fused are determined based on a user instruction.
11. The method according to any one of claims 1-4, further comprising, after said updating said preset set of labels to obtain a first set of target labels within said specific area based on the quality of each of said labels in said preset set of labels:
acquiring a plurality of second service data in other fields;
for each second service data, labeling the second service data based on label definition sentences corresponding to labels in the first target label set to obtain labeling results of the second service data;
and determining the quality of each label in the first target label set according to labeling results of the second service data, and updating the first target label set according to the quality of each label in the first target label set to obtain a second target label set in other fields.
12. A tag set generating apparatus, comprising:
the first acquisition module is used for acquiring a plurality of first service data in a specific field and presetting label definition sentences corresponding to all labels in a label set, wherein the label definition sentences are used for representing definitions of the corresponding labels;
the first labeling module is used for labeling the first service data based on label definition sentences corresponding to the labels in the preset label set for each first service data so as to obtain labeling results of the first service data;
the first updating module is used for determining the quality of each label in the preset label set according to labeling results of a plurality of first service data, and updating the preset label set according to the quality of each label in the preset label set to obtain a first target label set in the specific field.
13. The apparatus of claim 12, wherein the first labeling module comprises:
the processing unit is used for executing a label definition statement corresponding to each label in a preset label set aiming at each label in the preset label set so as to judge whether the first service data meets the definition of the label or not;
And the labeling unit is used for labeling the first service data by using the label under the condition that the first service data meets the definition of the label.
14. The apparatus of claim 13, wherein the processing unit comprises:
a first processing subunit, configured to replace, when a tag definition statement corresponding to the tag includes a clause, a service data macro definition included in the clause in the tag definition statement corresponding to the tag with the first service data, and execute the replaced tag definition statement, so as to determine whether the first service data satisfies a regular expression in the clause;
a first determining subunit, configured to determine that the first service data satisfies the definition of the tag, where the first service data satisfies a regular expression in the clause;
and the second determining subunit is used for determining that the first service data does not meet the definition of the tag in the case that the first service data does not meet the regular expression in the clause.
15. The apparatus of claim 14, wherein the processing unit further comprises:
A second processing subunit, configured to replace, in a case where a tag definition sentence corresponding to the tag includes a plurality of clauses, a service data macro definition included in each of the clauses in the tag definition sentence corresponding to the tag with the first service data, and execute the replaced tag definition sentence, so as to determine whether the first service data satisfies a regular expression in each of the clauses, and whether logic that satisfies a preset logic symbol connected to each of the clauses is satisfied;
a third determining subunit, configured to determine that, in a case where the first service data satisfies the regular expression and the logic in each clause, the first service data satisfies the definition of the tag;
and a fourth determining subunit, configured to determine that the first service data does not satisfy the definition of the tag, where the first service data does not satisfy the regular expression in the clause and/or does not satisfy the logic.
16. The apparatus of claim 14 or 15, wherein the regular expression includes a plurality of keywords, each of the keywords being separated by a preset separator; the processing unit further includes:
A fifth determining subunit, configured to determine, in a case where the first service data includes at least one target keyword of the plurality of keywords separated by the preset separator, that the first service data satisfies the regular expression.
17. The apparatus according to claim 14 or 15, wherein the clause further comprises a first macro definition of a preset tag and/or a second macro definition of a preset string;
the first labeling module further comprises:
a replacing unit, configured to replace the first macro definition with a label definition statement of a preset label corresponding to the first macro definition, and/or replace the second macro definition with a preset character string corresponding to the second macro definition, so as to obtain a formatted definition statement corresponding to the label;
the processing unit is used for:
and executing the formatting definition statement corresponding to the label to judge whether the first service data meets the definition of the label.
18. The apparatus of claim 13, wherein the processing unit comprises:
and the third processing subunit is used for executing the calling statement to call the corresponding preset labeling model when the label definition statement corresponding to the label comprises the calling statement corresponding to the preset labeling model, and judging whether the first service data meets the definition of the label or not by using the preset labeling model.
19. The apparatus according to any one of claims 12-15, wherein the first update module comprises:
the determining unit is used for determining the quality value of each label in the preset label set under at least one preset quality index according to the labeling results of the plurality of first service data;
and the screening unit is used for screening each label in the preset label set according to the quality value of each label under at least one preset index, and obtaining the first target label set according to the screened labels.
20. The apparatus according to any one of claims 12-15, wherein the apparatus further comprises:
the first fusion module is used for carrying out fusion processing on at least two tags to be fused in the preset tag set according to a preset fusion mode to obtain fusion tags, wherein the at least two tags to be fused are determined based on user instructions;
and the second fusion module is used for carrying out fusion processing on the label definition sentences corresponding to the at least two labels according to the preset fusion mode to obtain the fusion label definition sentences corresponding to the fusion labels.
21. The apparatus according to any one of claims 12-15, wherein the apparatus further comprises:
And the third fusion module is used for carrying out fusion processing on at least two tags to be fused in the first target tag set according to a preset fusion mode so as to obtain a fused first target tag set, wherein the at least two tags to be fused are determined based on a user instruction.
22. The apparatus according to any one of claims 12-15, further comprising:
the second acquisition module is used for acquiring a plurality of second service data in other fields;
the second labeling module is used for labeling the second service data based on label definition sentences corresponding to the labels in the first target label set for each second service data so as to obtain a labeling result of the second service data;
and the second updating module is used for determining the quality of each label in the first target label set according to labeling results of a plurality of second service data, and updating the first target label set according to the quality of each label in the first target label set to obtain a second target label set in other fields.
23. An electronic device, comprising:
At least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-11.
24. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-11.
CN202211008886.5A 2022-08-22 2022-08-22 Tag set generation method and device and electronic equipment Pending CN117668625A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211008886.5A CN117668625A (en) 2022-08-22 2022-08-22 Tag set generation method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211008886.5A CN117668625A (en) 2022-08-22 2022-08-22 Tag set generation method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN117668625A true CN117668625A (en) 2024-03-08

Family

ID=90075483

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211008886.5A Pending CN117668625A (en) 2022-08-22 2022-08-22 Tag set generation method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN117668625A (en)

Similar Documents

Publication Publication Date Title
US11599714B2 (en) Methods and systems for modeling complex taxonomies with natural language understanding
WO2022022045A1 (en) Knowledge graph-based text comparison method and apparatus, device, and storage medium
US11645317B2 (en) Recommending topic clusters for unstructured text documents
US20220138404A1 (en) Browsing images via mined hyperlinked text snippets
US10713291B2 (en) Electronic document generation using data from disparate sources
CN105022733B (en) DINFO OEC text analyzings method for digging and equipment
US9477756B1 (en) Classifying structured documents
US11216492B2 (en) Document annotation based on enterprise knowledge graph
US11250035B2 (en) Knowledge graph generating apparatus, method, and non-transitory computer readable storage medium thereof
US20220083949A1 (en) Method and apparatus for pushing information, device and storage medium
CN112541359B (en) Document content identification method, device, electronic equipment and medium
CN111708805A (en) Data query method and device, electronic equipment and storage medium
CN112989208B (en) Information recommendation method and device, electronic equipment and storage medium
US20220391426A1 (en) Multi-system-based intelligent question answering method and apparatus, and device
US20120158742A1 (en) Managing documents using weighted prevalence data for statements
CN111553556A (en) Business data analysis method and device, computer equipment and storage medium
CN115099239B (en) Resource identification method, device, equipment and storage medium
US20170249555A1 (en) Concepts and link discovery system
CN113836316B (en) Processing method, training method, device, equipment and medium for ternary group data
CN114490723A (en) Unified retrieval method, device, electronic equipment and storage medium
WO2021063089A1 (en) Rule matching method, rule matching apparatus, storage medium and electronic device
CN115248890A (en) User interest portrait generation method and device, electronic equipment and storage medium
CN116955856A (en) Information display method, device, electronic equipment and storage medium
CN111666479A (en) Method for searching web page and computer readable storage medium
CN117668625A (en) Tag set generation method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination