CN116383710A - Label determining method, device, electronic equipment and storage medium - Google Patents

Label determining method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116383710A
CN116383710A CN202211641048.1A CN202211641048A CN116383710A CN 116383710 A CN116383710 A CN 116383710A CN 202211641048 A CN202211641048 A CN 202211641048A CN 116383710 A CN116383710 A CN 116383710A
Authority
CN
China
Prior art keywords
label
defect
preset
safety data
classification model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211641048.1A
Other languages
Chinese (zh)
Inventor
张文学
于帮付
苏萌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Percent Technology Group Co ltd
Original Assignee
Beijing Percent Technology Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Percent Technology Group Co ltd filed Critical Beijing Percent Technology Group Co ltd
Priority to CN202211641048.1A priority Critical patent/CN116383710A/en
Publication of CN116383710A publication Critical patent/CN116383710A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Testing And Monitoring For Control Systems (AREA)
  • Time Recorders, Dirve Recorders, Access Control (AREA)

Abstract

The application discloses a label determining method, a label determining device, electronic equipment and a storage medium, which belong to the technical field of computers, and the method comprises the following steps: acquiring automobile safety data to be processed; determining at least one defect label of the automobile safety data according to a preset defect label dictionary; under the condition that the defect label is failed to be determined according to the preset defect label dictionary, the automobile safety data is input into a multi-label classification model to obtain at least one defect label of the automobile safety data, wherein the multi-label classification model is trained based on a plurality of safety data samples, and the defect label of the automobile recall complaint data can be rapidly and accurately determined.

Description

Label determining method, device, electronic equipment and storage medium
Technical Field
The application belongs to the technical field of computers, and particularly relates to a tag determining method, a tag determining device, electronic equipment and a storage medium.
Background
As automotive technology advances, security issues become more and more important, as do complaint data concerning various problems with automobiles. However, the complaint data of the recall defect of the automobile has the problems of unstructured, unclear semantics and the like, and the complaint data cannot be classified into which part of the automobile happens quickly and accurately. Therefore, how to classify the tags of the current car recall complaint data and realize the tagging of the car recall complaint data is a technical problem which needs to be solved currently.
Disclosure of Invention
The embodiment of the application provides a tag determining method, a tag determining device, electronic equipment and a storage medium, which can rapidly and accurately determine a defect tag of automobile recall complaint data.
In a first aspect, an embodiment of the present application provides a tag determining method, including: acquiring automobile safety data to be processed; determining at least one defect label of the automobile safety data according to a preset defect label dictionary; and under the condition that the defect label is determined to be failed according to the preset defect label dictionary, inputting the automobile safety data into a multi-label classification model to obtain at least one defect label of the automobile safety data, wherein the multi-label classification model is trained based on a plurality of safety data samples.
In a second aspect, an embodiment of the present application provides a tag determining apparatus, including: the acquisition module is used for acquiring the automobile safety data to be processed; the determining module is used for determining at least one defect label of the automobile safety data according to a preset defect label dictionary; the determining module is further configured to input the automotive safety data into a multi-label classification model to obtain at least one defective label of the automotive safety data, where the multi-label classification model is obtained by training based on a plurality of safety data samples, when determining that the defective label fails according to the preset defective label dictionary.
In a third aspect, embodiments of the present application provide an electronic device comprising a processor, a memory and a program or instruction stored on the memory and executable on the processor, the program or instruction implementing the steps of the method according to the first aspect when executed by the processor.
In a fourth aspect, embodiments of the present application provide a readable storage medium having stored thereon a program or instructions which when executed by a processor implement the steps of the method according to the first aspect.
In a fifth aspect, embodiments of the present application provide a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and where the processor is configured to execute a program or instructions to implement a method according to the first aspect.
In the tag determination method provided by the application, firstly, the defect tag dictionary is preset to recommend the defect tag for the automobile safety data, and when the defect tag dictionary is preset and the defect tag of the automobile safety data cannot be determined, the trained multi-tag classification model is utilized to determine the defect tag of the automobile safety data. Therefore, the problem that the automobile safety data is quickly and accurately corresponding to the defect label is solved, and the labeling of the automobile safety data is realized.
Drawings
Fig. 1 is a schematic flow chart of a tag determining method according to an embodiment of the present application;
FIG. 2 is a flowchart of another method for determining a tag according to an embodiment of the present application;
FIG. 3 is a flowchart of yet another tag determination method according to an embodiment of the present application;
FIG. 4 is a schematic flow chart of a specific implementation of a tag determination method according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a tag determining apparatus according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device according to one embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.
In the aspect of analysis of the data of the recall complaints of the automobile, no method for classifying the data by multiple tags is available at present, and the classification of the data by multiple tags is studied at present, for example: CNN-RNN utilizes a convolution neural network and a recurrent neural network to integrate application in multi-label text classification, captures global and context semantic information at the same time, and also based on an embedded language learning model ELMO, the CNN-RNN can effectively learn context semantic information based on a two-way long-short-term memory network BiLSTM learning framework, and the models either face the problems of gradient disappearance, incapability of parallelization and the like, or face the problems of insufficient text characterization capability and the like, or face the problem that only on the basis of massive data can training to obtain better results. The automotive recall complaint data has strong field characteristics (for example, the complaint data is not massive, the complaint data text is short text, the complaint type text data length is relatively balanced), the pre-training language model has strong text characterization capability and semantic understanding capability, the textCNN utilizes the classification structure of the CNN to classify sentences formed by word vectors, and the windows with different sizes are utilized to capture the local information extraction information of the sentences, so that the automotive recall complaint data is very suitable for short text classification. Therefore, the multi-label automobile recall complaint text classification model based on the ALBERT-TextCNN deep learning model framework is realized by combining the ALBERT pre-training language model and the deep convolutional neural network TextCNN.
The technical problem to be solved mainly in the application is to make fault standard, namely what the defect labels of the first-level assembly and the second-level assembly of the automobile are, which the described keywords have, and the corresponding fault grades; secondly, the content of specific defect complaint data is mapped onto the fault standard. And secondly, selecting a proper text multi-label classification method according to the characteristics of the automobile recall complaint data.
The following describes in detail a tag determining method, a tag determining device, an electronic device and a storage medium provided in the embodiments of the present application through specific embodiments and application scenarios thereof with reference to the accompanying drawings.
Fig. 1 illustrates a tag determination method provided by an embodiment of the present invention, which may be performed by an electronic device, which may include: server and/or terminal device. In other words, the method may be performed by software or hardware installed in an electronic device, the method comprising the following steps.
Step 110: and acquiring the safety data of the automobile to be processed.
The vehicle safety data may include, among other things, vehicle basic information, failure phenomena, failure location, e.g. "xxx engine cracked".
Sources of automotive safety data may include: official website complaint data, discussion data of automobile media websites, public opinion data of social platforms, technical service BULLETIN (TECHNICAL SERVICE bus, TSB) BULLETIN data of automobile enterprises, and traffic accident related data.
It should be noted that, the method for acquiring the automobile safety data may be data mining by means of big data analysis, data dotting, or the like, or the user may fill in the data by means of a client.
Step 120: and determining at least one defect label of the automobile safety data according to a preset defect label dictionary.
The preset defect tag dictionary is created based on historical automobile safety data by using expert experience and is used for recommending tags of the automobile safety data to be processed. The preset defect label comprises a fault position and a fault feature, wherein the fault position comprises a primary assembly and a secondary assembly, the primary assembly refers to a primary part of an automobile and is assembled by a host factory, for example, an automobile air conditioner belongs to the primary assembly; the secondary assembly refers to a secondary component of an automobile, which is manufactured by a supplier of a supplier, and is a part for manufacturing a primary component, that is, a part, for example, an automobile air conditioner evaporator is a part of an automobile air conditioner, and belongs to the secondary assembly. The fault characteristics include keywords and severity levels, the keywords are fault descriptions of the safety data of the automobile, for example, the keywords of the "xxx engine has cracks" are provided with cracks, the severity levels refer to the quantification operation of the defect degree of the safety data of the automobile, and the severity levels can be classified into 5 levels: high, medium, low.
It should be noted that, the at least one defect tag refers to that a piece of automobile safety data may correspond to a plurality of defect tags, for example, an automobile fault complaint may include a plurality of fault manifestations, such as: oil leakage, peculiar smell, etc., the corresponding defect label may be: oil leakage and peculiar smell.
In one implementation, as shown in FIG. 2, step 120 includes:
step 121: preprocessing the automobile safety data, and determining a primary assembly, a secondary assembly and a target keyword of the automobile safety data.
Preprocessing refers to splitting automobile safety data according to a primary assembly and a secondary assembly to form preprocessed data indexed by the primary assembly and the secondary assembly, for example, aiming at the piece of safety data: the vehicle body is found to shake frequently, and after pretreatment, the first-stage assembly is a vehicle body, the second-stage assembly is a vehicle frame, and the target keyword is shake. The preprocessing operation is performed by a business person or a car recall expert.
It should be noted that, each piece of automobile safety data is indexed by a unique primary assembly and secondary assembly, and if one piece of automobile safety data relates to a plurality of sets of primary assemblies and secondary assemblies, then the automobile safety data needs to be split into different primary assemblies and secondary assemblies and corresponding data contents during preprocessing.
Step 122: and matching the target keywords of the automobile safety data with the keywords of a preset defect label dictionary, and taking the matched preset defect label as a standby label.
Step 123: and under the condition that the standby label is determined to be the same as a primary assembly and a secondary assembly of the automobile safety data, taking the standby label as a defect label of the automobile safety data.
It can be understood that the logic of determining the defect label is that the keyword maintained by the preset defect label dictionary is matched with the keyword in the automobile safety data, and the secondary screening is performed by the first-level assembly and the second-level assembly of the automobile safety data and the defect label. For example, a keyword of the car security data includes any one keyword of the a tag and any one keyword of the B tag, and at the same time, the primary assembly and the secondary assembly of each of the a tag and the B tag are consistent with the car security data, and then the a tag and the B tag are recommended to the car security data, that is, the a tag and the B tag are determined to be defective tags of the car security data.
Optionally, for the determined defect label, other service personnel or specialists can check again, so that the automobile safety data is ensured to correspond to the correct defect label.
Step 130: and under the condition that the defect label is determined to be failed according to the preset defect label dictionary, inputting the automobile safety data into a multi-label classification model to obtain at least one defect label of the automobile safety data.
The multi-label classification model is trained based on a plurality of safety data samples.
Because the preset defect label dictionary is created based on historical automobile safety data by using expert experience, the defect label cannot be quickly and accurately recommended for the automobile safety data when the automobile safety data is in face of unstructured and semantically unclear. Therefore, the problem can be well solved by the multi-label classification model. And inputting the automobile safety data after failure in determining the defect label according to the preset defect label dictionary into a multi-label classification model, and giving out the defect label of the automobile safety data by the multi-label classification model.
In the implementation mode of the application, firstly, the defect label is recommended for the automobile safety data through the preset defect label dictionary, and when the defect label of the automobile safety data cannot be determined through the preset defect label dictionary, the trained multi-label classification model is utilized to determine the defect label of the automobile safety data. Therefore, the problem that the automobile safety data is quickly and accurately corresponding to the defect label is solved, and the labeling of the automobile safety data is realized.
In one implementation, before the determining at least one defect tag of the car security data according to a preset defect tag dictionary, the method further includes: acquiring an automobile safety data set, wherein the automobile safety data set comprises a plurality of automobile safety data; analyzing each piece of automobile safety data to obtain corresponding defect characteristics; and determining the preset defect label dictionary based on the defect characteristic classification characteristic grade.
It can be understood that the preset defect label dictionary needs to be determined before the defect label of the automobile safety data is determined according to the preset defect label dictionary, that is, a plurality of automobile safety data in the automobile safety data set are analyzed to obtain the defect characteristics of each automobile safety data, then the defect label is determined according to the defect characteristics of each automobile safety data, and then the corresponding severity level of the defect label is set based on the defect label. It should be noted that, the determination of the preset defect label dictionary may be performed by a service person or a car recall expert.
In one implementation, the defect signature includes a fault location and a keyword, wherein the fault location includes a primary assembly and a secondary assembly; the feature grade is divided based on the defect features, the preset defect label dictionary is determined, and the defect label of each automobile safety data is determined according to the primary assembly, the secondary assembly and the keywords; setting a severity level of the defective label based on the defective label; and determining the preset defect label dictionary based on the primary assembly, the secondary assembly, the keywords, the defect labels and the severity level corresponding to the defect labels of the automobile safety data. Illustratively, as shown in table 1, a record table of a preset defect label dictionary:
Figure BDA0004009068170000071
the preset defect label dictionary can be modified, and the modification time and the modifier can be recorded, so that the disputed defect label can be traced back.
It should be noted that, the task of tag determination is completed in the data warehouse, and the task is an offline task, so that the newly acquired automobile safety data and the newly maintained preset defect tag dictionary can be used in the tag determination task only after the specified time, wherein the specified time can be set according to the actual situation, and the specific setting of the specified time is not performed in the application.
In one implementation, before the inputting the car security data into the multi-tag classification model to obtain the at least one defect tag of the car security data, the method further includes: training a preset multi-label classification model based on a training set and a preset loss function until the loss function converges, and obtaining the trained preset multi-label classification model; based on the verification set, verifying that the trained preset multi-label classification model accords with a first preset effect; based on a test set, testing the preset multi-label classification model after the verification is qualified, and taking the preset multi-label classification model meeting a second preset effect as the multi-label classification model.
The training set, the verification set and the test set are used for randomly extracting high-quality label data processed by service personnel or automobile recall specialists and distributing the high-quality label data according to the proportion of 3:1:1. Alternatively, the format of the high quality tag data may be [ complaint data: tag 1 tag 2 … tag n ], wherein the tags are separated by tab bond, for example, "[ automobile purchased from 2011 to 6 years so far, there is always peculiar smell in the automobile, and many methods are used, so that the peculiar smell cannot be eradicated: peculiar smell of the vehicle body frame ] ".
In this implementation, constructing the multi-label classification model may include a training process, a verification process, and a test process, the training process training a preset multi-label classification model based on a training set and a predetermined loss function; the verification process is based on a verification set to verify whether the trained preset multi-label classification model achieves a preset effect; the test process is to evaluate whether the verified preset multi-label classification model can be put into use based on a test set, wherein the first preset effect means that the preset multi-label classification model cannot be fitted in the verification process, and the second preset effect means that the preset multi-label classification model can be deployed as a production model.
It should be noted that, under the condition that the verification is failed or the test is not passed, parameter tuning and continuous training are required to be performed on the preset multi-label classification model.
In one implementation, the preset multi-label classification model includes: the language model and the text classification model are pre-trained.
The pre-trained language model may be a lightweight BERT model (A Lite BERT for Self-supervised Learning of Language Representations, ALBERT), and the text classification model may be a text classification convolutional neural network (Text Convolutional Neural Networks, textCNN).
As shown in fig. 3, the training of the preset multi-label classification model based on the training set and the predetermined loss function until the loss function converges, to obtain a trained preset multi-label classification model, includes the following steps:
step 310: inputting the training set into the pre-training language model, and extracting semantic features of the training samples in the training set through the pre-training language model to obtain text feature vectors of the training samples.
Specifically, the training set is input into the pre-training language model, each training sample in the training set is processed through a processing layer of the pre-training language model, an original word vector, a position vector and a primary text vector of each training sample are generated, the original word vector, the position vector and the primary text vector of each training sample are processed through an encoder of the pre-training language model, and a text feature vector of each training sample is output.
Step 320: inputting the text feature vector into the text classification model, and performing multi-level processing on the text feature vector through the text classification model to obtain a trained preset multi-label classification model.
Wherein, in one implementation, the text classification model comprises an input layer, a convolution layer, a pooling layer, a first fully-connected layer, a second fully-connected layer;
the multi-level processing of the text feature vector comprises the following steps: extracting semantic features of the text feature vectors through the input layer to obtain high-level text feature vectors of the training samples; performing convolution operation on the high-level text feature vector through the convolution layer to obtain a convolution feature vector; performing dimension reduction operation on the convolution feature vector through the pooling layer to obtain a pooling feature vector; carrying out dropout operation on the pooled feature vector through the first full connection layer to obtain a first feature vector; activating the first feature vector through the second full-connection layer to obtain first prediction probability of the training sample for each defect label, and obtaining a plurality of first prediction probabilities; the multi-label classification model is trained based on a plurality of the first predictive probabilities and the predetermined loss function. In another implementation, the predetermined loss function may be:
Figure BDA0004009068170000091
wherein N represents the number of the training samples, L represents the number of defective label placement corresponding to the training samples,
Figure BDA0004009068170000092
at [0,1]Interval, y, representing the predictive probability of the defect label ij At [0,1]And (c) indicating whether the ith training sample belongs to the jth defect label.
In one implementation manner, the testing the preset multi-label classification model after the verification is qualified includes: and evaluating the preset multi-label classification model based on preset evaluation parameters.
Wherein the preset evaluation parameters can include accuracy, recall and F 1 Values.
It should be noted that, the preset multi-label classification model may be evaluated by classifying the training samples in the test set into Positive (Positive) and Negative (Negative) classes, so that the preset multi-label classification modelHas four classification results: TP (True Positive): the correct positive example, one example is a positive class and is also determined to be a positive class; FN (False Negative): the false case is missed, the false case is determined as the positive case; FP (False Positive): false positive examples, false reports, false classes, positive classes; TN (True Negative): an example of a correct counterexample is a false class and is also determined to be a false class. From the above four classification results, the Precision, recall and F can be determined 1 Specific values of the values. Wherein the accuracy can be determined by the following formula:
Figure BDA0004009068170000101
recall may be determined by the following formula:
Figure BDA0004009068170000102
it should be noted that the accuracy and recall rate are mutually affected, and are mutually "restricted": the recall rate is low if the pursuit accuracy is high; pursuing a high recall rate, accuracy is often compromised. In order to make the accuracy rate and recall rate of the predicted result high, an F-score index can be introduced to balance the accuracy rate and recall rate, so that the preset multi-label classification model can be comprehensively evaluated. Wherein F-score can be determined by the following formula:
Figure BDA0004009068170000103
it should be noted that if β is taken to be 1, it means that Precision is as important as Recall; if β is less than 1, it means that Precision is more important than Recall; if β is greater than 1, it means that Recall is more important than Precision. In this implementation beta takes 1, i.e. F 1 Is the harmonic average of the precision and recall, F 1-Score Larger values indicate a higher model quality and therefore a higher accuracy, recall, and F 1 When the values are all more than 0.8, the methodF 1-Score And when the value of the (2) is maximum, determining the preset multi-label classification model as a multi-label classification model.
In one implementation, after the determining the at least one defect tag of the car security data, further comprising: and converting at least one defect label of the automobile safety data into a one-hot variable for storage.
Illustratively, the default defect tag dictionary includes 500 defect tags, two defect tags are involved in certain car security data: cooling kettle cracks and ignition coil damage faults, wherein the two defects are positioned at the 2 nd and 4 th positions of a preset defect label dictionary, and one-hot label codes of the automobile safety data are as follows: 0101000 ….
Fig. 4 shows a schematic flow chart of a specific embodiment of a tag determination method in the present application, which includes the following steps:
step 410: randomly dividing complaint label data marked by expert experience into three parts: training set, validation set and test set.
Step 420: and training a multi-label classification model by using the training set by utilizing the ALBERT pre-training model and the textCNN model.
Step 430: and confirming the final model according to the accuracy rate and recall rate of the produced model on the verification set and the test.
Step 440: and deploying a multi-classification NLP model for complaint data tag recommendation.
In the implementation mode, the text multi-label classification model is trained by utilizing complaint label data confirmed by experts and an ALBERT+TextCNN method, so that when proper defect labels cannot be matched through keywords, the trained NLP classification model can be utilized for defect label recommendation.
The label specification method of the present embodiment is described above in detail with reference to fig. 1 to 4, and a label specification device of the present embodiment is described below in detail with reference to fig. 5.
Fig. 5 shows a schematic structural diagram of a tag determining apparatus provided in an embodiment of the present disclosure, and as shown in fig. 3, the tag determining apparatus 500 may include: an acquisition module 510, a determination module 520.
An obtaining module 510, configured to obtain vehicle safety data to be processed; a determining module 520, configured to determine at least one defect tag of the automobile security data according to a preset defect tag dictionary; the determining module 520 is further configured to input the automotive safety data into a multi-label classification model to obtain at least one defective label of the automotive safety data, where the multi-label classification model is trained based on a plurality of safety data samples, if determining the defective label according to the preset defective label dictionary fails.
In one implementation, the determining module 520 is further configured to obtain a car security dataset before the determining at least one defect tag of the car security data according to the preset defect tag dictionary, where the car security dataset includes a plurality of the car security data; analyzing each piece of automobile safety data to obtain corresponding defect characteristics; and determining the preset defect label dictionary based on the defect characteristic classification characteristic grade.
In one implementation, the defect feature includes a fault location and a keyword, where the fault location includes a primary assembly and a secondary assembly, and the determining module 520 is further configured to determine a defect tag of each of the automobile safety data according to the primary assembly, the secondary assembly, and the keyword; setting a severity level of the defective label based on the defective label; and determining the preset defect label dictionary based on the primary assembly, the secondary assembly, the keywords, the defect labels and the severity level corresponding to the defect labels of the automobile safety data.
In one implementation, the determining module 520 is further configured to pre-process the car security data, and determine a primary assembly, a secondary assembly, and a target keyword of the car security data; matching the target keywords of the automobile safety data with the keywords of a preset defect label dictionary, and taking the matched preset defect label as a standby label; and under the condition that the standby label is determined to be the same as a primary assembly and a secondary assembly of the automobile safety data, taking the standby label as a defect label of the automobile safety data.
In one implementation manner, the determining module 520 is further configured to train the preset multi-label classification model based on a training set and a predetermined loss function until the loss function converges before the inputting the car safety data into the multi-label classification model to obtain at least one defect label of the car safety data, so as to obtain a trained preset multi-label classification model; based on the verification set, verifying that the trained preset multi-label classification model accords with a first preset effect; based on a test set, testing the preset multi-label classification model after the verification is qualified, and taking the preset multi-label classification model meeting a second preset effect as the multi-label classification model.
In one implementation, the preset multi-label classification model includes: the determining module 520 is further configured to input the training set into the pre-training language model, and extract semantic features of the training samples in the training set through the pre-training language model to obtain text feature vectors of the training samples; inputting the text feature vector into the text classification model, and performing multi-level processing on the text feature vector through the text classification model to obtain a trained preset multi-label classification model.
In one implementation manner, the text classification model includes an input layer, a convolution layer, a pooling layer, a first fully-connected layer, a second fully-connected layer, and a determining module 520, which is further configured to perform semantic feature extraction on the text feature vector through the input layer, to obtain a high-level text feature vector of the training sample; performing convolution operation on the high-level text feature vector through the convolution layer to obtain a convolution feature vector; performing dimension reduction operation on the convolution feature vector through the pooling layer to obtain a pooling feature vector; carrying out dropout operation on the pooled feature vector through the first full connection layer to obtain a first feature vector; activating the first feature vector through the second full-connection layer to obtain first prediction probability of the training sample for each defect label, and obtaining a plurality of first prediction probabilities; training the preset multi-label classification model based on a plurality of the first predictive probabilities and the predetermined loss function.
In one implementation, the predetermined loss function of the determination module 520 may be:
Figure BDA0004009068170000131
wherein N represents the number of training samples, L represents the number of defect labels corresponding to the training samples,
Figure BDA0004009068170000132
at [0,1]Interval, y, representing the prediction probability of a defective label ij At [0,1]And (c) indicating whether the ith training sample belongs to the jth tag.
In one implementation, the determining module 520 is further configured to evaluate the preset multi-label classification model based on preset evaluation parameters.
In one implementation, the determining module 520 is further configured to convert, after the determining the at least one defect tag of the car security data, the at least one defect tag of the car security data into a one-hot variable for storing.
The tag determining apparatus in the embodiments of the present application may be an apparatus, or may be a component, an integrated circuit, or a chip in an electronic device. The embodiments of the present application are not particularly limited.
A tag determination apparatus in an embodiment of the present application may be an apparatus having an operating system. The operating system may be an Android operating system, an ios operating system, or other possible operating systems, which are not specifically limited in the embodiments of the present application.
The label determining apparatus provided in this embodiment of the present application can implement each process implemented in the method embodiment of fig. 1, and in order to avoid repetition, a description is omitted here.
Optionally, as shown in fig. 6, the embodiment of the present application further provides an electronic device, including a processor 610, a memory 620, and a program or an instruction stored in the memory 620 and capable of running on the processor 610, where the program or the instruction implements each process of the embodiment of the method when executed by the processor 610, and the process can achieve the same technical effect, and for avoiding repetition, a detailed description is omitted herein.
The embodiment of the present application further provides a readable storage medium, where a program or an instruction is stored, and when the program or the instruction is executed by a processor, the processes of the embodiment of the tag determination method are implemented, and the same technical effects can be achieved, so that repetition is avoided, and no further description is given here.
Wherein the processor is a processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium such as a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk or an optical disk, and the like.
The embodiment of the application further provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled with the processor, and the processor is configured to run a program or an instruction, so as to implement each process of the above-mentioned embodiment of the tag determination method, and achieve the same technical effect, so that repetition is avoided, and no further description is provided herein.
It should be understood that the chips referred to in the embodiments of the present application may also be referred to as system-on-chip chips, chip systems, or system-on-chip chips, etc.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Furthermore, it should be noted that the scope of the methods and apparatus in the embodiments of the present application is not limited to performing the functions in the order shown or discussed, but may also include performing the functions in a substantially simultaneous manner or in an opposite order depending on the functions involved, e.g., the described methods may be performed in an order different from that described, and various steps may also be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), including several instructions for causing a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method described in the embodiments of the present application.
The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those of ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are also within the protection of the present application.

Claims (13)

1. A tag determination method, comprising:
acquiring automobile safety data to be processed;
determining at least one defect label of the automobile safety data according to a preset defect label dictionary;
and under the condition that the defect label is determined to be failed according to the preset defect label dictionary, inputting the automobile safety data into a multi-label classification model to obtain at least one defect label of the automobile safety data, wherein the multi-label classification model is trained based on a plurality of safety data samples.
2. The method of claim 1, further comprising, prior to said determining at least one defect tag of said car security data based on a preset defect tag dictionary:
acquiring an automobile safety data set, wherein the automobile safety data set comprises a plurality of automobile safety data;
analyzing each piece of automobile safety data to obtain corresponding defect characteristics;
and determining the preset defect label dictionary based on the defect characteristic classification characteristic grade.
3. The method of claim 2, wherein the defect signature comprises a fault location and a keyword, wherein the fault location comprises a primary assembly and a secondary assembly;
the determining the preset defect label dictionary based on the feature grade of the defect feature comprises the following steps:
determining a defect label of each automobile safety data according to the primary assembly, the secondary assembly and the keywords;
setting a severity level of the defective label based on the defective label;
and determining the preset defect label dictionary based on the primary assembly, the secondary assembly, the keywords, the defect labels and the severity level corresponding to the defect labels of the automobile safety data.
4. The method of claim 1, wherein determining at least one defect tag of the car security data based on a preset defect tag dictionary, comprises:
preprocessing the automobile safety data, and determining a primary assembly, a secondary assembly and a target keyword of the automobile safety data;
matching the target keywords of the automobile safety data with the keywords of a preset defect label dictionary, and taking the matched preset defect label as a standby label;
and under the condition that the standby label is determined to be the same as the primary assembly and the secondary assembly of the automobile safety data, taking the standby label as a defect label of the automobile safety data.
5. The method of claim 1, further comprising, prior to said entering the automotive safety data into a multi-tag classification model to obtain at least one defect tag of the automotive safety data:
training a preset multi-label classification model based on a training set and a preset loss function until the loss function converges, and obtaining the trained preset multi-label classification model;
based on the verification set, verifying that the trained preset multi-label classification model accords with a first preset effect;
based on a test set, testing the preset multi-label classification model after the verification is qualified, and taking the preset multi-label classification model meeting a second preset effect as the multi-label classification model.
6. The method of claim 5, wherein the pre-set multi-label classification model comprises: pre-training a language model and a text classification model;
training a preset multi-label classification model based on a training set and a preset loss function until the loss function converges, and obtaining the trained preset multi-label classification model, wherein the training set comprises the following steps:
inputting the training set into the pre-training language model, and extracting semantic features of the training samples in the training set through the pre-training language model to obtain text feature vectors of the training samples;
inputting the text feature vector into the text classification model, and performing multi-level processing on the text feature vector through the text classification model to obtain the trained preset multi-label classification model.
7. The method of claim 6, wherein the text classification model comprises an input layer, a convolution layer, a pooling layer, a first fully-connected layer, a second fully-connected layer;
the multi-level processing of the text feature vector comprises the following steps:
extracting semantic features of the text feature vectors through the input layer to obtain high-level text feature vectors of the training samples;
performing convolution operation on the high-level text feature vector through the convolution layer to obtain a convolution feature vector;
performing dimension reduction operation on the convolution feature vector through the pooling layer to obtain a pooling feature vector;
carrying out dropout operation on the pooled feature vector through the first full connection layer to obtain a first feature vector;
activating the first feature vector through the second full-connection layer to obtain first prediction probability of the training sample for each defect label, and obtaining a plurality of first prediction probabilities;
training the preset multi-label classification model based on a plurality of the first predictive probabilities and the predetermined loss function.
8. The method of claim 5, wherein the predetermined loss function is:
Figure FDA0004009068160000032
wherein N represents the training sampleL represents the number of defective label placement corresponding to the training sample,
Figure FDA0004009068160000033
at [0,1]Interval, y, representing the predictive probability of the defect label ij At [0,1]And (c) indicating whether the ith training sample belongs to the jth defect label.
9. The method of claim 5, wherein the testing the pre-set multi-label classification model after the verification is passed comprises:
and evaluating the preset multi-label classification model based on preset evaluation parameters.
10. The method of claim 1, further comprising, after said determining at least one defect tag of the car security data:
and converting at least one defect label of the automobile safety data into a one-hot variable for storage.
11. A tag determination apparatus, comprising:
the acquisition module is used for acquiring the automobile safety data to be processed;
the determining module is used for determining at least one defect label of the automobile safety data according to a preset defect label dictionary;
the determining module is further configured to input the automotive safety data into a multi-label classification model to obtain at least one defective label of the automotive safety data, where the multi-label classification model is obtained by training based on a plurality of safety data samples, when determining that the defective label fails according to the preset defective label dictionary.
12. An electronic device comprising a processor, a memory and a program or instruction stored on the memory and executable on the processor, which when executed by the processor, implements the steps of the tag determination method of any of claims 1-10.
13. A readable storage medium, characterized in that the readable storage medium has stored thereon a program or instructions which, when executed by a processor, implement the steps of the tag determination method according to any of claims 1-10.
CN202211641048.1A 2022-12-20 2022-12-20 Label determining method, device, electronic equipment and storage medium Pending CN116383710A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211641048.1A CN116383710A (en) 2022-12-20 2022-12-20 Label determining method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211641048.1A CN116383710A (en) 2022-12-20 2022-12-20 Label determining method, device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116383710A true CN116383710A (en) 2023-07-04

Family

ID=86964410

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211641048.1A Pending CN116383710A (en) 2022-12-20 2022-12-20 Label determining method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116383710A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117688137A (en) * 2024-01-31 2024-03-12 成都航空职业技术学院 Data analysis method and system applied to automobile marketing management system software

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117688137A (en) * 2024-01-31 2024-03-12 成都航空职业技术学院 Data analysis method and system applied to automobile marketing management system software
CN117688137B (en) * 2024-01-31 2024-04-12 成都航空职业技术学院 Data analysis method and system applied to automobile marketing management system software

Similar Documents

Publication Publication Date Title
CN111984779B (en) Dialogue text analysis method, device, equipment and readable medium
CN112416778B (en) Test case recommendation method and device and electronic equipment
CN111210842B (en) Voice quality inspection method, device, terminal and computer readable storage medium
CN109284374B (en) Method, apparatus, device and computer readable storage medium for determining entity class
CN112256849B (en) Model training method, text detection method, device, equipment and storage medium
CN113778894B (en) Method, device, equipment and storage medium for constructing test cases
CN111177390A (en) Accident vehicle identification method and device based on hybrid model
CN107545505B (en) Method and system for identifying insurance financing product information
CN111338692A (en) Vulnerability classification method and device based on vulnerability codes and electronic equipment
CN118013963B (en) Method and device for identifying and replacing sensitive words
CN115240145A (en) Method and system for detecting illegal operation behaviors based on scene recognition
CN116383710A (en) Label determining method, device, electronic equipment and storage medium
CN117409419A (en) Image detection method, device and storage medium
CN117707922A (en) Method and device for generating test case, terminal equipment and readable storage medium
CN113778875B (en) System test defect classification method, device, equipment and storage medium
CN110866172A (en) Data analysis method for block chain system
CN112884018A (en) Power grid line fault recognition model training method and power grid line inspection method
CN117349434A (en) Voice classification method, device and storage medium
CN115438153A (en) Training method and device for intention matching degree analysis model
CN114254588A (en) Data tag processing method and device
CN114626798A (en) Task flow determination method and device, computer readable storage medium and terminal
CN113408263A (en) Criminal period prediction method and device, storage medium and electronic device
CN109739950B (en) Method and device for screening applicable legal provision
CN113449506A (en) Data detection method, device and equipment and readable storage medium
CN113011162A (en) Reference resolution method, device, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination