WO2022134591A1 - 分阶段质检的数据分类方法、装置、设备及存储介质 - Google Patents
分阶段质检的数据分类方法、装置、设备及存储介质 Download PDFInfo
- Publication number
- WO2022134591A1 WO2022134591A1 PCT/CN2021/109696 CN2021109696W WO2022134591A1 WO 2022134591 A1 WO2022134591 A1 WO 2022134591A1 CN 2021109696 W CN2021109696 W CN 2021109696W WO 2022134591 A1 WO2022134591 A1 WO 2022134591A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- violation
- stage
- model
- classification
- Prior art date
Links
- 238000007689 inspection Methods 0.000 title claims abstract description 112
- 238000000034 method Methods 0.000 title claims abstract description 40
- 238000013145 classification model Methods 0.000 claims abstract description 70
- 230000007246 mechanism Effects 0.000 claims abstract description 35
- 239000013598 vector Substances 0.000 claims description 91
- 238000012549 training Methods 0.000 claims description 43
- 230000006870 function Effects 0.000 claims description 25
- 230000004913 activation Effects 0.000 claims description 23
- 238000013528 artificial neural network Methods 0.000 claims description 16
- 238000000605 extraction Methods 0.000 claims description 14
- 239000011159 matrix material Substances 0.000 claims description 14
- 230000000306 recurrent effect Effects 0.000 claims description 9
- 125000004122 cyclic group Chemical group 0.000 claims description 6
- 230000005540 biological transmission Effects 0.000 claims description 5
- 238000004590 computer program Methods 0.000 claims 6
- 238000005516 engineering process Methods 0.000 abstract description 4
- 238000013473 artificial intelligence Methods 0.000 abstract description 3
- 230000008569 process Effects 0.000 description 9
- 230000006399 behavior Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 238000012545 processing Methods 0.000 description 5
- 238000012550 audit Methods 0.000 description 2
- 230000002457 bidirectional effect Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000012552 review Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
Definitions
- the present application relates to the technical field of neural networks, and in particular, to a data classification method, apparatus, device and storage medium for quality inspection in stages.
- Sales agencies of products in various industries usually collect audio-visual materials and electronic data through technical means such as audio recording and video recording, so as to record and save the key links in the sales process of products in various industries, so as to realize the replay of sales behavior, the inquiries of important information, and the responsibility for problems. Confirmable effect.
- the present application provides a data classification method, device, equipment and storage medium for phased quality inspection, which improves the flexibility of the recognition optimization model and improves the accuracy of converting other tasks into text.
- a first aspect of the present application provides a data classification method for quality inspection in stages, including: acquiring data to be inspected, the data to be inspected being text data; inputting the data to be inspected into a first In the first-stage illegal data identification model, two-category data is generated, and the first-stage illegal data identification model is a two-category model; according to the two-category data, a header identifier and a tail identifier are added to the data to be quality-checked and input into the violation data classification model of the second stage, and combine the attention mechanism to classify the violation data to generate the violation type data.
- the violation data classification model of the second stage is the BERT model; transfer the violation type data to A target terminal, where the target terminal is a terminal that sends the data to be checked.
- a second aspect of the present application provides a data classification device for phased quality inspection, including a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor, and the processor executes
- the computer-readable instruction implements the following steps: acquiring data to be inspected, the data to be inspected is text data; inputting the data to be inspected into a first-stage violation data identification model to generate two-category data,
- the violation data identification model in the first stage is a two-class model; according to the two-class data, a header identifier and a tail identifier are added to the data to be quality-checked, and input into the violation data classification model in the second stage,
- Combining the attention mechanism to classify the violation data, and generate the violation type data the violation data classification model of the second stage is the BERT model; the violation type data is transmitted to the target terminal, and the target terminal sends out the quality inspection pending. data terminal.
- a third aspect of the present application provides a computer-readable storage medium, where computer instructions are stored in the computer-readable storage medium, and when the computer instructions are executed on the computer, the computer is caused to perform the following steps: acquiring data to be quality-checked , the data to be inspected is text data; the data to be inspected is input into the first-stage violation data identification model to generate two-class data, and the first-stage violation data identification model is a two-class model; according to The two-category data adds a header identifier and a tail identifier to the data to be quality-checked, and inputs it into the second-stage violation data classification model, and combines the attention mechanism to classify the violation data to generate violation type data.
- the violation data classification model in the second stage is the BERT model; the violation type data is transmitted to the target terminal, and the target terminal is the terminal that sends the data to be inspected.
- a fourth aspect of the present application provides a data classification device for staged quality inspection, comprising: an acquisition module for acquiring data to be quality-checked, wherein the data to be quality-checked is text data; a violation data identification module for The data to be quality-checked is input into the violation data identification model of the first stage to generate binary data, and the violation data identification model of the first stage is a binary model; the violation data classification module is used to classify data according to the binary data.
- the head identifier and tail identifier are added to the data to be inspected, and input into the second-stage violation data classification model, and the violation data is classified in combination with the attention mechanism to generate violation type data.
- the classification model is a BERT model; a transmission module is used to transmit the violation type data to a target terminal, where the target terminal is the terminal that sends the data to be checked.
- the data to be inspected is obtained, and the data to be inspected is text data; the data to be inspected is input into the first-stage violation data identification model to generate binary data, the first
- the first-stage illegal data identification model is a two-class model; according to the two-class data, the head identifier and the tail identifier are added to the data to be quality-checked, and input into the second-stage illegal data classification model, combined with attention
- the mechanism classifies the violation data and generates the violation type data, and the violation data classification model in the second stage is the BERT model; the violation type data is transmitted to the target terminal, and the target terminal is the terminal that sends the data to be inspected.
- the quality inspection model in the first stage is used to generate the two-category data, and when the two-category data is illegal data, the violation classification is performed by the quality inspection model in the second stage, and the violation type data is generated; by using two The quality inspection model in the first stage identifies the illegal data first, and then identifies the violation type data corresponding to the illegal data, so that the quality inspection model in the second stage can focus on the classification of the illegal data, solve the problem of unbalanced quality inspection data, and improve the quality of the data. Quality inspection accuracy.
- FIG. 1 is a schematic diagram of an embodiment of a data classification method for staged quality inspection in an embodiment of the application
- FIG. 2 is a schematic diagram of another embodiment of the data classification method for phased quality inspection in an embodiment of the present application
- FIG. 3 is a schematic diagram of an embodiment of a data classification device for staged quality inspection in an embodiment of the present application
- FIG. 4 is a schematic diagram of another embodiment of the data classification apparatus for quality inspection in stages according to an embodiment of the present application.
- FIG. 5 is a schematic diagram of an embodiment of a data classification device for phased quality inspection in an embodiment of the present application.
- the embodiments of the present application provide a data classification method, device, device, and storage medium for phased quality inspection. "Third”, “Fourth”, etc., if present, are used to distinguish similar objects and are not necessarily used to describe a particular order or precedence. It is to be understood that data so used may be interchanged under appropriate circumstances so that the embodiments described herein can be practiced in sequences other than those illustrated or described herein.
- the terms “comprising” or “having” and any variations thereof are intended to cover non-exclusive inclusion, for example, a process, method, system, product or device comprising a series of steps or units is not necessarily limited to those expressly listed steps or units, but may include other steps or units not expressly listed or inherent to these processes, methods, products or devices.
- an embodiment of the data classification method for the staged quality inspection in the embodiment of the present application includes:
- the server obtains the data to be inspected as text data. It should be emphasized that, in order to further ensure the privacy and security of the above-mentioned data to be quality-checked, the above-mentioned data to be quality-checked can also be stored in a node of a blockchain.
- the data to be inspected is an important basis for inquiring about sales behavior, inquiring about important information and confirming responsibility for problems.
- the data to be inspected can be the data to be inspected in the securities scenario, or the data to be inspected in the insurance scenario.
- This implementation The example is mainly based on the data to be inspected in the insurance scenario.
- the data to be inspected can specifically be "you can enjoy the 18% income directly while saving money. You can rest assured that this does not require you to spend an extra cent. You can enjoy the money with confidence.” and “Ok, let me tell you the last thing here, because you are a partner of accumulating high-end annuity, and in the future, we will have an economic review of the identity of an annuity customer for the first time.” and other text data.
- the execution subject of the present application may be a data classification device for phased quality inspection, and may also be a terminal or a server, which is not specifically limited here.
- the embodiments of the present application take the server as an execution subject as an example for description.
- the server inputs the data to be quality-checked as the first-stage illegal data identification model of the two-class model to identify the illegal data, and generate two-class data.
- the data to be inspected is mainly inspected in two stages.
- the illegal data identification model is used for inspection, and the server first enters the data to be inspected into violations.
- violating data and non-violating data are identified to generate two-category data, wherein the two-category data is label data, which is used to indicate whether the data to be inspected violates the rules.
- the server inputs the data to be quality-checked into the illegal data identification model to identify the illegal data, and generates binary data of "violation data"; in another embodiment, it is assumed that the data to be quality-checked is " , then let me give you a final comment, because you are a high-end accumulated annuity cooperative customer, in the future, for the first time, we will have an economic audit of the identity of an annuity customer.”
- the server will input the data to be inspected into the illegal data to identify In the model, binary data is generated as "no violation data”.
- the violation data classification model is the BERT model
- the server adds a header identifier and a tail identifier on the basis of the data to be quality-checked according to the binary data, and inputs the data to be quality-checked after adding the header identifier and the tail identifier as a violation of the second stage of the BERT model
- the data classification model combines the attention mechanism to classify the violation labels and generate violation type data.
- the full name of the BERT model is Bidirectional encoder representations from transformers.
- the server adds a header identifier for classification based on the data to be quality inspection, that is, the header token.
- the tail identifier used for sentence classification that is, the tail token, and then input the data to be inspected with the head identifier and the tail identifier added into the second-stage violation data classification model, combined with the attention mechanism and the head identifier. and the tail identifier to classify the violation data to generate the violation type data.
- the server After obtaining the violation type data, the server transmits the violation type data to the target terminal that sends out the data to be checked.
- the server transmits the violation type data to the target terminal, the terminal is the terminal of different customers, and the target terminal is the client terminal that sends out the data to be inspected.
- This quality inspection method can break the time and geographical restrictions, so as to achieve communication with customers and inquire about sales. the purpose of human behavior.
- the quality inspection model in the first stage is used to generate the two-category data, and when the two-category data is illegal data, the violation classification is performed by the quality inspection model in the second stage, and the violation type data is generated; by using two The quality inspection model in the first stage identifies the illegal data first, and then identifies the violation type data corresponding to the illegal data, so that the quality inspection model in the second stage can focus on the classification of the illegal data, solve the problem of unbalanced quality inspection data, and improve the quality of the data. Quality inspection accuracy.
- FIG. 2 another embodiment of the data classification method for staged quality inspection in the embodiment of the present application includes:
- first-stage training data and the second-stage training data where the first-stage training data is two-type label data, and the second-stage training data is multi-class label data;
- the server obtains the first-stage training data for training the model and the second-stage training data for training the model, wherein the first-stage training data is the second-stage label data, and the second-stage label data is the "violation" label data and the "no” label data Illegal” label data, the second stage training data is multi-class label data, multi-class label data can be about 20 kinds of label data such as “misleading money without spending money”, “misleading money can be used at any time”, “confusing return period”, etc. .
- the server uses the first-stage training data to train the first-stage violation data identification model, and the second-stage training data to train the second-stage violation data classification model.
- the server uses the "violation" label data and the "non-violation” label data to train the two-class model, and generates a first-stage violation data identification model.
- the first-stage violation data identification model it can identify whether the data to be quality inspection is It is illegal data; the server uses about 20 kinds of label data, such as "misleading without spending money”, “misleading money can be used at any time”, and “confused return period", for the second-stage model training to generate the second-stage illegal data classification
- the second-stage violation data classification model can identify which type of violation data the "violation data" to be inspected is specifically.
- the server obtains the data to be inspected as text data. It should be emphasized that, in order to further ensure the privacy and security of the above-mentioned data to be quality-checked, the above-mentioned data to be quality-checked can also be stored in a node of a blockchain.
- the data to be inspected is an important basis for inquiring about sales behavior, inquiring about important information and confirming responsibility for problems.
- the data to be inspected can be the data to be inspected in the securities scenario, or the data to be inspected in the insurance scenario.
- This implementation The example is mainly based on the data to be inspected in the insurance scenario.
- the data to be inspected can specifically be "you can enjoy the 18% income directly while saving money. You can rest assured that this does not require you to spend an extra cent. You can enjoy the money with confidence.” and “Ok, let me tell you the last thing here, because you are a partner of accumulating high-end annuity, and in the future, we will have an economic review of the identity of an annuity customer for the first time.” and other text data.
- the server inputs the data to be quality-checked as the first-stage illegal data identification model of the two-class model to identify the illegal data, and generate two-class data.
- the data to be inspected is mainly inspected in two stages.
- the illegal data identification model is used for inspection, and the server first enters the data to be inspected into violations.
- violating data and non-violating data are identified to generate two-category data, wherein the two-category data is label data, which is used to indicate whether the data to be inspected violates the rules.
- the server inputs the data to be quality-checked into the illegal data identification model to identify the illegal data, and generates binary data of "violation data"; in another embodiment, it is assumed that the data to be quality-checked is " , then let me give you a final comment, because you are a high-end accumulated annuity cooperative customer, in the future, for the first time, we will have an economic audit of the identity of an annuity customer.”
- the server will input the data to be inspected into the illegal data to identify In the model, binary data is generated as "no violation data”.
- the server inputs the data to be quality-checked as the violation data identification model of the first stage of the binary classification model, and firstly extracts the features of the data to be checked in the recurrent neural network to generate a first text feature vector;
- the text feature vector is input into the fully connected layer of the model, and combined with the activation function for data processing to generate binary classification data.
- the data to be inspected is "you can enjoy it directly while saving money to enjoy 18% of the income. You can rest assured that this does not require you to spend an extra penny, and you can enjoy it with confidence.”
- the data to be checked is input into the recurrent neural network for feature extraction, and the first text feature vector is generated as [y 1 y 2 y 3 ... y m ], and then the server inputs the first text feature vector into the full connection in the model
- the activation function is combined in the layer to generate binary data "violation data”.
- the server inputs the data to be inspected as the violation data identification model of the first stage of the binary classification model, and firstly performs feature extraction on the data to be inspected in the recurrent neural network, and generates the first text feature vector including:
- the server inputs the data to be checked into the violation data identification model of the first stage of the binary classification model, and generates a text vector matrix in combination with the preset vector space model; then the server inputs the text vector matrix into the cyclic neural network combined with the activation function for feature detection Extraction to generate a first text feature vector.
- the server will take the pending quality inspection.
- the inspection data is input into the violation data identification model in the first stage, and the generated text vector matrix is:
- the server inputs the text vector matrix into the recurrent neural network and combines the activation function to generate the first text feature vector [y 1 y 2 y 3 ... y m ]. It should be noted that in the process of processing the text vector matrix into a text feature vector by using a cyclic neural network combined with an activation function, redundant text features will be deleted, so m in the first text feature vector is smaller than k in the text vector matrix. .
- the server inputs the first text feature vector into the fully connected layer of the model, and performs data processing in combination with the activation function to generate binary data including:
- the server inputs the first text feature vector into the fully connected layer for feature weighting to generate a text classification score; then the server combines the activation function to calculate the text classification score, generates a target classification probability, and determines binary classification data based on the target classification probability.
- the server inputs the first text feature vector into the fully connected layer, and performs weighting according to a preset weighting formula.
- the formula is:
- the server generates a text classification score after this calculation, and then uses the activation
- the function calculates the text classification score.
- the activation function is a softmax function, specifically:
- the target classification probability is generated. Since the sum of the probabilities of the softmax function is 1, in the second classification, only one of the classification probabilities needs to be calculated to obtain the other classification probability, namely 1-P i , and the server finally Binary classification data is determined based on the target classification probability.
- the violation data classification model is the BERT model
- the server adds a header identifier and a tail identifier on the basis of the data to be quality-checked according to the binary data, and inputs the data to be quality-checked after adding the header identifier and the tail identifier as a violation of the second stage of the BERT model
- the data classification model combines the attention mechanism to classify the violation labels and generate violation type data.
- the full name of the BERT model is Bidirectional encoder representations from transformers.
- the server adds a header identifier for classification based on the data to be quality inspection, that is, the header token.
- the tail identifier used for sentence classification that is, the tail token, and then input the data to be inspected with the head identifier and the tail identifier added into the second-stage violation data classification model, combined with the attention mechanism and the head identifier. and the tail identifier to classify the violation data to generate the violation type data.
- the server first determines whether the binary data is illegal data; if the server determines that the binary data is illegal data, the server adds a header identifier and a tail identifier to the data to be quality-checked, and stores the processed quality-restricted data.
- the inspection data input is the violation data classification model of the second stage of the BERT model, and the violation data is classified by combining the attention mechanism, the head identifier and the tail identifier, and the violation type data is generated.
- the server adds the header identifier and the tail identifier to the data to be inspected, and inputs the processed data to be inspected as the illegal data classification model in the second stage of the BERT model. , which combines the attention mechanism, head identifier and tail identifier to classify violation data, and generates violation type data including:
- the server adds a header identifier and a tail identifier to the data to be inspected to generate processed data to be inspected; the server inputs the processed data to be inspected into the second-stage illegal data
- the classification model generates a second text feature vector, and the second text feature vector includes multiple word feature vectors; the server reads the vector distance of each two adjacent word feature vectors respectively, and obtains multiple vector distances; the server combines the attention mechanism to The distances of the multiple vectors are converted to 1, and the violation data is classified on the second text feature vector in combination with the head identifier and the tail identifier to generate violation type data.
- the server adds a header identifier to the head of the data to be inspected, and adds a tail identifier to the end of the data to be inspected, so as to generate the processed data to be inspected as "[CLS] While enjoying the 18% income, you can enjoy it directly, you can rest assured that this does not require you to spend an extra penny, you can enjoy it with confidence. [SEP]”.
- the header identifier [CLS] is for classification, which is used as a temporary marker for classification
- the tail identifier [SEP] is for segmentation, which is used to separate temporary markers of different sentences.
- the server inputs the processed data to be inspected into the violation data classification model, and generates a second text feature vector.
- the server reads the vector distance of each two adjacent word feature vectors to obtain multiple vector distances, and the server combines the attention
- the force mechanism converts each vector distance to 1, enabling classification by combining the left and right contexts in all layers of the model.
- the second text feature vector generated by combining the data to be quality inspected with the [CLS] algorithm includes weights. The larger the weight, the more attention corresponding to the combined attention mechanism.
- the server combines the attention mechanism to classify and identify the second text feature vector, and generate violation type data "misleading without spending money".
- the violation type data may also be misleading that money can be used at any time, confusing the return period, and the like.
- the server After obtaining the violation type data, the server transmits the violation type data to the target terminal that sends out the data to be checked.
- the server transmits the violation type data to the target terminal, the terminal is the terminal of different customers, and the target terminal is the client terminal that sends out the data to be inspected.
- This quality inspection method can break the time and geographical restrictions, so as to achieve communication with customers and inquire about sales. the purpose of human behavior.
- the quality inspection model in the first stage is used to generate the two-category data, and when the two-category data is illegal data, the violation classification is performed by the quality inspection model in the second stage, and the violation type data is generated; by using two The quality inspection model in the first stage identifies the illegal data first, and then identifies the violation type data corresponding to the illegal data, so that the quality inspection model in the second stage can focus on the classification of the illegal data, solve the problem of unbalanced quality inspection data, and improve the quality of the data. Quality inspection accuracy.
- An embodiment of the data classification apparatus includes:
- a quality inspection data acquisition module 301 configured to acquire data to be inspected, wherein the data to be inspected is text data;
- Violation data identification module 302 configured to input the data to be quality-checked into the violation data identification model of the first stage to generate binary data, and the violation data identification model of the first stage is a binary classification model;
- Violation data classification module 303 configured to add a header identifier and a tail identifier to the data to be quality-checked according to the second-class data, and input them into the second-stage violation data classification model, and combine the attention mechanism to classify the violation data Classify, generate violation type data, and the violation data classification model in the second stage is the BERT model;
- the transmission module 304 is configured to transmit the violation type data to a target terminal, where the target terminal is a terminal that sends the data to be checked for quality.
- the quality inspection model in the first stage is used to generate the two-category data, and when the two-category data is illegal data, the violation classification is performed by the quality inspection model in the second stage, and the violation type data is generated; by using two The quality inspection model in the first stage identifies the illegal data first, and then identifies the violation type data corresponding to the illegal data, so that the quality inspection model in the second stage can focus on the classification of the illegal data, solve the problem of unbalanced quality inspection data, and improve the quality of the data. Quality inspection accuracy.
- another embodiment of the data classification device for staged quality inspection in the embodiment of the present application includes:
- a quality inspection data acquisition module 301 configured to acquire data to be inspected, wherein the data to be inspected is text data;
- Violation data identification module 302 configured to input the data to be quality-checked into the violation data identification model of the first stage to generate binary data, and the violation data identification model of the first stage is a binary classification model;
- Violation data classification module 303 configured to add a header identifier and a tail identifier to the data to be quality-checked according to the second-class data, and input them into the second-stage violation data classification model, and combine the attention mechanism to classify the violation data Classify, generate violation type data, and the violation data classification model in the second stage is the BERT model;
- the transmission module 304 is configured to transmit the violation type data to a target terminal, where the target terminal is a terminal that sends the data to be checked for quality.
- the violation data identification module 302 includes:
- the feature extraction unit 3021 is used to input the data to be quality-checked into the violation data identification model of the first stage, perform feature extraction on the data to be quality-checked in a recurrent neural network, and generate a first text feature vector.
- the first-stage violation data identification model is a two-class model;
- the binary data generation unit 3022 is configured to input the first text feature vector into the fully connected layer, and combine with the activation function to generate binary data.
- the feature extraction unit 3021 can also be specifically used for:
- the text vector matrix is input into a cyclic neural network and combined with an activation function for feature extraction to generate a first text feature vector.
- the binary data generating unit 3022 can also be specifically used for:
- the text classification score is calculated in combination with an activation function, a target classification probability is generated, and binary classification data is determined based on the target classification probability.
- the violation data classification module 303 includes:
- Judging unit 3031 for judging whether the two-category data is illegal data
- Violation data classification unit 3032 if the second classification data is illegal data, it is used to add a header identifier and a tail identifier to the data to be quality-checked, and input the violation data classification model of the second stage, combined with attention
- the mechanism classifies the violation data and generates violation type data, and the second-stage violation data classification model is the BERT model.
- violation data classification unit 3032 can also be specifically used for:
- the two-category data is illegal data, add a header identifier and a tail identifier to the data to be inspected to generate processed data to be inspected;
- the multiple vector distances are converted into 1 in combination with the attention mechanism, and the violation data is classified on the second text feature vector in combination with the head identifier and the tail identifier to generate violation type data.
- the data classification device for staged quality inspection further includes:
- a training data acquisition module 305 configured to acquire first-stage training data and second-stage training data, wherein the first-stage training data is two-type label data, and the second-stage training data is multi-class label data;
- the model training module 306 is configured to use the first-stage training data for model training, generate a first-stage violation data identification model, and use the second-stage training data for model training to generate a second-stage violation data classification Model.
- the quality inspection model in the first stage is used to generate the two-category data, and when the two-category data is illegal data, the violation classification is performed by the quality inspection model in the second stage, and the violation type data is generated; by using two The quality inspection model in the first stage identifies the illegal data first, and then identifies the violation type data corresponding to the illegal data, so that the quality inspection model in the second stage can focus on the classification of the illegal data, solve the problem of unbalanced quality inspection data, and improve the quality of the data. Quality inspection accuracy.
- FIGS 3 and 4 above describe in detail the data classification device for phased quality inspection in the embodiment of the present application from the perspective of modular functional entities.
- the following describes the data classification device for phased quality inspection in the embodiment of the present application from the perspective of hardware processing. Describe in detail.
- FIG. 5 is a schematic structural diagram of a data classification device for phased quality inspection provided by an embodiment of the present application.
- the data classification device 500 for phased quality inspection may vary greatly due to different configurations or performances, and may include one or more One or more central processing units (CPUs) 510 (eg, one or more processors) and memory 520, one or more storage media 530 (eg, one or more mass storage devices) that store applications 533 or data 532 ).
- the memory 520 and the storage medium 530 may be short-term storage or persistent storage.
- the program stored in the storage medium 530 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations in the data classification apparatus 500 for phased quality inspection.
- the processor 510 may be configured to communicate with the storage medium 530 to execute a series of instruction operations in the storage medium 530 on the data classification device 500 for phased quality inspection.
- the data classification device 500 for phased quality inspection may also include one or more power supplies 540, one or more wired or wireless network interfaces 550, one or more input and output interfaces 560, and/or, one or more operating systems 531 , such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, etc.
- operating systems 531 such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, etc.
- the present application also provides a data classification device for quality inspection in stages, including: a memory and at least one processor, wherein instructions are stored in the memory, and the memory and the at least one processor are interconnected through a line; the at least one processor A processor invokes the instructions in the memory to cause the data classification device for phased quality inspection to perform the steps in the above-mentioned data classification method for phased quality inspection.
- the present application also provides a computer-readable storage medium, and the computer-readable storage medium may be a non-volatile computer-readable storage medium or a volatile computer-readable storage medium.
- the computer-readable storage medium stores computer instructions, and when the computer instructions are executed on the computer, the computer performs the following steps:
- the violation data classification model in the second stage is the BERT model
- the violation type data is transmitted to a target terminal, where the target terminal is a terminal that sends the data to be checked for quality.
- the blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
- Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block.
- the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
- the integrated unit if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium.
- the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, and the computer software products are stored in a storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
- the aforementioned storage medium includes: U disk, removable hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Human Resources & Organizations (AREA)
- Entrepreneurship & Innovation (AREA)
- Economics (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Development Economics (AREA)
- General Health & Medical Sciences (AREA)
- Educational Administration (AREA)
- Artificial Intelligence (AREA)
- Game Theory and Decision Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
一种分阶段质检的数据分类方法、装置、设备(500)及存储介质(530),涉及人工智能技术领域和区块链技术,用于解决质检数据不均衡的问题,从而提高质检的准确率,待质检数据可存储于区块链中。分阶段质检的数据分类方法包括:获取待质检数据,待质检数据为文本数据(101);将待质检数据输入第一阶段的违规数据识别模型中,生成二分类数据,第一阶段的违规数据识别模型为二分类模型(102);根据二分类数据对待质检数据添加头部标识符和尾部标识符,并输入第二阶段的违规数据分类模型中结合注意力机制进行违规数据分类,生成违规类型数据,第二阶段的违规数据分类模型为BERT模型(103);将违规类型数据传输至目标终端,目标终端为发出待质检数据的终端(104)。
Description
本申请要求于2020年12月23日提交中国专利局、申请号为202011538857.0、发明名称为“分阶段质检的数据分类方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在申请中。
本申请涉及神经网络技术领域,尤其涉及一种分阶段质检的数据分类方法、装置、设备及存储介质。
各行业产品的销售机构通常要通过录音、录像等技术手段采集视听资料、电子数据,从而记录和保存各行业产品销售过程中的关键环节,以便实现销售行为可回放、重要信息可查询、问题责任可确认的效果。在对这种记录销售过程进行质检时,存在速度慢、成本高的问题。因此很多公司引入计算机技术对销售过程的行为进行质检。
很多公司引入人工智能技术,从而实现业务数据实时对接、话术实时自动拼接、实时智能质检、后台话术模板自由配置,有效助力各行业交易规范化,提升各行业的经营效益。发明人意识到,在采用人工智能模型进行实时智能质检时,由于违规数据与无违规数据存在不均衡的问题,从而导致质检的准确率较低。
发明内容
本申请提供了一种分阶段质检的数据分类方法、装置、设备及存储介质,提高了识别优化模型的灵活性,而且提高了其他任务转化为文本的准确率。
为实现上述目的,本申请第一方面提供了一种分阶段质检的数据分类方法,包括:获取待质检数据,所述待质检数据为文本数据;将所述待质检数据输入第一阶段的违规数据识别模型中,生成二分类数据,所述第一阶段的违规数据识别模型为二分类模型;根据所述二分类数据对所述待质检数据添加头部标识符和尾部标识符,并输入第二阶段的违规数据分类模型中,结合注意力机制进行违规数据分类,生成违规类型数据,所述第二阶段的违规数据分类模型为BERT模型;将所述违规类型数据传输至目标终端,所述目标终端为发出所述待质检数据的终端。
本申请第二方面提供了一种分阶段质检的数据分类设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:获取待质检数据,所述待质检数据为文本数据;将所述待质检数据输入第一阶段的违规数据识别模型中,生成二分类数据,所述第一阶段的违规数据识别模型为二分类模型;根据所述二分类数据对所述待质检数据添加头部标识符和尾部标识符,并输入第二阶段的违规数据分类模型中,结合注意力机制进行违规数据分类,生成违规类型数据,所述第二阶段的违规数据分类模型为BERT模型;将所述违规类型数据传输至目标终端,所述目标终端为发出所述待质检数据的终端。
本申请第三方面提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机指令,当所述计算机指令在计算机上运行时,使得计算机执行如下步骤:获取待质检数据,所述待质检数据为文本数据;将所述待质检数据输入第一阶段的违规数据识别模型中,生成二分类数据,所述第一阶段的违规数据识别模型为二分类模型;根据所述二分类数据对所述待质检数据添加头部标识符和尾部标识符,并输入第二阶段的违规数据分类模型中,结合注意力机制进行违规数据分类,生成违规类型数据,所述第二阶段的违规数据分类模型为BERT模型;将所述违规类型数据传输至目标终端,所述目标终端为发出所述待质检数据的终端。
本申请第四方面提供了一种分阶段质检的数据分类装置,包括:获取模块,用于获取 待质检数据,所述待质检数据为文本数据;违规数据识别模块,用于将所述待质检数据输入第一阶段的违规数据识别模型中,生成二分类数据,所述第一阶段的违规数据识别模型为二分类模型;违规数据分类模块,用于根据所述二分类数据对所述待质检数据添加头部标识符和尾部标识符,并输入第二阶段的违规数据分类模型中,结合注意力机制进行违规数据分类,生成违规类型数据,所述第二阶段的违规数据分类模型为BERT模型;传输模块,用于将所述违规类型数据传输至目标终端,所述目标终端为发出所述待质检数据的终端。
本申请提供的技术方案中,获取待质检数据,所述待质检数据为文本数据;将所述待质检数据输入第一阶段的违规数据识别模型中,生成二分类数据,所述第一阶段的违规数据识别模型为二分类模型;根据所述二分类数据对所述待质检数据添加头部标识符和尾部标识符,并输入第二阶段的违规数据分类模型中,结合注意力机制进行违规数据分类,生成违规类型数据,所述第二阶段的违规数据分类模型为BERT模型;将所述违规类型数据传输至目标终端,所述目标终端为发出所述待质检数据的终端。本申请实施例中,利用第一阶段的质检模型生成二分类数据,当二分类数据为违规数据时,再通过第二阶段的质检模型进行违规分类,生成违规类型数据;通过使用两个阶段的质检模型,先识别违规数据,再识别违规数据对应的违规类型数据,使得第二阶段的质检模型能够专注于违规数据的分类,解决了质检数据不均衡的问题,从而提高了质检的准确率。
图1为本申请实施例中分阶段质检的数据分类方法的一个实施例示意图;
图2为本申请实施例中分阶段质检的数据分类方法的另一个实施例示意图;
图3为本申请实施例中分阶段质检的数据分类装置的一个实施例示意图;
图4为本申请实施例中分阶段质检的数据分类装置的另一个实施例示意图;
图5为本申请实施例中分阶段质检的数据分类设备的一个实施例示意图。
本申请实施例提供了一种分阶段质检的数据分类方法、装置、设备及存储介质,本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”或“具有”及其任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
为便于理解,下面对本申请实施例的具体流程进行描述,请参阅图1,本申请实施例中分阶段质检的数据分类方法的一个实施例包括:
101、获取待质检数据,待质检数据为文本数据;
服务器获取为文本数据的待质检数据。需要强调的是,为进一步保证上述待质检数据的私密和安全性,上述待质检数据还可以存储于一区块链的节点中。
待质检数据为查询销售行为、查询重要信息和确认问题责任的重要依据,待质检数据可以为证券场景下的待质检数据,也可以为保险场景下的待质检数据等,本实施例主要以保险场景下的待质检数据进行说明,待质检数据具体可以为“在您攒钱享受18%收益的同时,就可以直接享有了,您放心,这个不需要您额外花费一分钱,您可以放心地享有。”和“行,那这边给您最后说一下,因为您是积累高端年金合作客户,今后的话,我们第一次会有一个年金客户身份的一个经济审核。”等文本数据。
可以理解的是,本申请的执行主体可以为分阶段质检的数据分类装置,还可以是终端 或者服务器,具体此处不做限定。本申请实施例以服务器为执行主体为例进行说明。
102、将待质检数据输入第一阶段的违规数据识别模型中,生成二分类数据,第一阶段的违规数据识别模型为二分类模型;
服务器将待质检数据输入为二分类模型的第一阶段的违规数据识别模型中进行违规数据识别,生成二分类数据。
在对待质检数据进行质检时,主要通过两个阶段对待质检数据进行质检,在第一阶段进行质检时,采用违规数据识别模型进行质检,服务器先将待质检数据输入违规数据识别模型中,进行违规数据与无违规数据的识别,从而生成二分类数据,其中,二分类数据为标签数据,用于指示待质检数据是否违规。
在一实施例中,假设待质检数据为“在您攒钱享受18%收益的同时,就可以直接享有了,您放心,这个不需要您额外花费一分钱,您可以放心地享有。”,在第一阶段,服务器将该待质检数据输入违规数据识别模型中进行违规数据识别,生成为“违规数据”的二分类数据;在另一实施例中,假设待质检数据为“行,那这边给您最后说一下,因为您是积累高端年金合作客户,今后的话,我们第一次会有一个年金客户身份的一个经济审核。”,服务器将该待质检数据输入违规数据识别模型中,生成为“无违规数据”的二分类数据。
103、根据二分类数据对待质检数据添加头部标识符和尾部标识符,并输入第二阶段的违规数据分类模型中,结合注意力机制进行违规数据分类,生成违规类型数据,第二阶段的违规数据分类模型为BERT模型;
服务器根据二分类数据在待质检数据的基础上添加头部标识符和尾部标识符,并将添加头部标识符和尾部标识符后的待质检数据输入为BERT模型的第二阶段的违规数据分类模型,结合注意力机制进行违规标签的分类,生成违规类型数据。
其中,BERT模型的全称为Bidirectional encoder representations from transformers,当二分类数据符合第二阶段的质检标准时,服务器在待质检数据的基础上添加用于分类的头部标识符,即头部token,以及用于语句分类的尾部标识符,即尾部token,然后将添加头部标识符和尾部标识符的待质检数据输入第二阶段的违规数据分类模型中,结合注意力机制、头部标识符和尾部标识符进行违规数据分类,生成违规类型数据。
104、将违规类型数据传输至目标终端,目标终端为发出待质检数据的终端。
服务器在得到违规类型数据之后,将该违规类型数据传输至发出待质检数据的目标终端。
服务器将违规类型数据传输至目标终端,终端为不同客户的终端,目标终端为发出待质检数据的客户终端,这种质检方式能够打破时间和地域的限制,从而达成与客户交流、查询销售人员行为的目的。
本申请实施例中,利用第一阶段的质检模型生成二分类数据,当二分类数据为违规数据时,再通过第二阶段的质检模型进行违规分类,生成违规类型数据;通过使用两个阶段的质检模型,先识别违规数据,再识别违规数据对应的违规类型数据,使得第二阶段的质检模型能够专注于违规数据的分类,解决了质检数据不均衡的问题,从而提高了质检的准确率。
请参阅图2,本申请实施例中分阶段质检的数据分类方法的另一个实施例包括:
201、获取第一阶段训练数据以及第二阶段训练数据,第一阶段训练数据为二类标签数据,第二阶段训练数据为多类标签数据;
服务器获取用于训练模型的第一阶段训练数据以及用于训练模型的第二阶段训练数据,其中,第一阶段训练数据为二类标签数据,二类标签数据为“违规”标签数据和“无违规”标签数据,第二阶段训练数据为多类标签数据,多类标签数据可以为“误导不花钱”、 “误导钱可以随取随用”、“混淆返还年限”等约20种标签数据。
202、采用第一阶段训练数据进行模型训练,生成第一阶段的违规数据识别模型,并采用第二阶段训练数据进行模型训练,生成第二阶段的违规数据分类模型;
服务器分别采用第一阶段训练数据训练第一阶段的违规数据识别模型,第二阶段训练数据训练第二阶段的违规数据分类模型。
服务器采用“违规”标签数据和“无违规”标签数据进行二分类模型的训练,生成第一阶段的违规数据识别模型,通过该第一阶段的违规数据识别模型,能够识别出待质检数据是否为违规数据;服务器采用“误导不花钱”、“误导钱可以随取随用”、“混淆返还年限”等约20种标签数据进行第二阶段的模型训练,生成第二阶段的违规数据分类模型,通过该第二阶段的违规数据分类模型能够识别出为“违规数据”的待质检数据具体为哪个类型的违规数据。
203、获取待质检数据,待质检数据为文本数据;
服务器获取为文本数据的待质检数据。需要强调的是,为进一步保证上述待质检数据的私密和安全性,上述待质检数据还可以存储于一区块链的节点中。
待质检数据为查询销售行为、查询重要信息和确认问题责任的重要依据,待质检数据可以为证券场景下的待质检数据,也可以为保险场景下的待质检数据等,本实施例主要以保险场景下的待质检数据进行说明,待质检数据具体可以为“在您攒钱享受18%收益的同时,就可以直接享有了,您放心,这个不需要您额外花费一分钱,您可以放心地享有。”和“行,那这边给您最后说一下,因为您是积累高端年金合作客户,今后的话,我们第一次会有一个年金客户身份的一个经济审核。”等文本数据。
204、将待质检数据输入第一阶段的违规数据识别模型中,生成二分类数据,第一阶段的违规数据识别模型为二分类模型;
服务器将待质检数据输入为二分类模型的第一阶段的违规数据识别模型中进行违规数据识别,生成二分类数据。
在对待质检数据进行质检时,主要通过两个阶段对待质检数据进行质检,在第一阶段进行质检时,采用违规数据识别模型进行质检,服务器先将待质检数据输入违规数据识别模型中,进行违规数据与无违规数据的识别,从而生成二分类数据,其中,二分类数据为标签数据,用于指示待质检数据是否违规。
在一实施例中,假设待质检数据为“在您攒钱享受18%收益的同时,就可以直接享有了,您放心,这个不需要您额外花费一分钱,您可以放心地享有。”,在第一阶段,服务器将该待质检数据输入违规数据识别模型中进行违规数据识别,生成为“违规数据”的二分类数据;在另一实施例中,假设待质检数据为“行,那这边给您最后说一下,因为您是积累高端年金合作客户,今后的话,我们第一次会有一个年金客户身份的一个经济审核。”,服务器将该待质检数据输入违规数据识别模型中,生成为“无违规数据”的二分类数据。
具体的,服务器将待质检数据输入为二分类模型的第一阶段的违规数据识别模型,首先在循环神经网络中对待质检数据进行特征提取,生成第一文本特征向量;然后服务器将第一文本特征向量输入该模型的全连接层中,并结合激活函数进行数据处理,生成二分类数据。
例如,待质检数据为“在您攒钱享受18%收益的同时,就可以直接享有了,您放心,这个不需要您额外花费一分钱,您可以放心地享有。”,服务器首先将该待质检数据输入循环神经网络中进行特征提取,生成第一文本特征向量为[y
1 y
2 y
3 ... y
m],然后服务器将该第一文本特征向量输入该模型中的全连接层中结合激活函数,生成二分类数据“违规数据”。
服务器将待质检数据输入为二分类模型的第一阶段的违规数据识别模型,首先在循环 神经网络中对待质检数据进行特征提取,生成第一文本特征向量包括:
服务器将待质检数据输入为二分类模型的第一阶段的违规数据识别模型中,结合预置的向量空间模型生成文本向量矩阵;然后服务器将文本向量矩阵输入循环神经网络中结合激活函数进行特征提取,生成第一文本特征向量。
假设待质检数据为“在您攒钱享受18%收益的同时,就可以直接享有了,您放心,这个不需要您额外花费一分钱,您可以放心地享有。”,服务器将该待质检数据输入第一阶段的违规数据识别模型中,生成文本向量矩阵为:
服务器将该文本向量矩阵输入循环神经网络中结合激活函数生成第一文本特征向量[y
1 y
2 y
3 ... y
m]。需要说明的是,在采用循环神经网络结合激活函数将文本向量矩阵处理为文本特征向量的过程中,会删除冗余的文本特征,因此第一文本特征向量中的m小于文本向量矩阵中的k。
然后服务器将第一文本特征向量输入该模型的全连接层中,并结合激活函数进行数据处理,生成二分类数据包括:
服务器将第一文本特征向量输入全连接层中进行特征加权,生成文本分类得分;然后服务器结合激活函数对文本分类得分进行计算,生成目标分类概率,并基于目标分类概率确定二分类数据。
在本实施例中,假设第一文本特征向量为[y
1 y
2 y
3 ... y
m],服务器将该第一文本特征向量输入全连接层中,按照预置加权公式进行加权,加权公式为:
S
i=w
i·y+b
i,
其中,i=0或者i=1,w
i为全连接层中的预置的权重,b
i为全连接层中的预置的偏置,服务器经过该计算,生成文本分类得分,然后采用激活函数对文本分类得分进行计算,在本实施例中,激活函数为softmax函数,具体的为:
经过该激活函数,生成目标分类概率,由于softmax函数的概率和为1,因此在二分类中,只需要计算出其中一个分类概率,即可得到另一个分类概率,即1-P
i,服务器最后基于目标分类概率确定二分类数据。
205、根据二分类数据对待质检数据添加头部标识符和尾部标识符,并输入第二阶段的违规数据分类模型中,结合注意力机制进行违规数据分类,生成违规类型数据,第二阶段的违规数据分类模型为BERT模型;
服务器根据二分类数据在待质检数据的基础上添加头部标识符和尾部标识符,并将添加头部标识符和尾部标识符后的待质检数据输入为BERT模型的第二阶段的违规数据分类模型,结合注意力机制进行违规标签的分类,生成违规类型数据。
其中,BERT模型的全称为Bidirectional encoder representations from transformers,当二分类数据符合第二阶段的质检标准时,服务器在待质检数据的基础上添加用于分类的头部标识符,即头部token,以及用于语句分类的尾部标识符,即尾部token,然后将添加头部标识符和尾部标识符的待质检数据输入第二阶段的违规数据分类模 型中,结合注意力机制、头部标识符和尾部标识符进行违规数据分类,生成违规类型数据。
具体的,服务器首先判断二分类数据是否为违规数据;若服务器判定二分类数据为违规数据,服务器则对待质检数据进行添加头部标识符和尾部标识符的处理,并将处理后的待质检数据输入为BERT模型第二阶段的违规数据分类模型,结合注意力机制、头部标识符和尾部标识符进行违规数据分类,生成违规类型数据。
若服务器判定二分类数据为违规数据,服务器则对待质检数据进行添加头部标识符和尾部标识符的处理,并将处理后的待质检数据输入为BERT模型第二阶段的违规数据分类模型,结合注意力机制、头部标识符和尾部标识符进行违规数据分类,生成违规类型数据包括:
若二分类数据为违规数据,服务器则为待质检数据添加头部标识符和尾部标识符,生成处理后的待质检数据;服务器将处理后的待质检数据输入第二阶段的违规数据分类模型,生成第二文本特征向量,第二文本特征向量包括多个单词特征向量;服务器分别读取每两个相邻单词特征向量的向量距离,得到多个向量距离;服务器结合注意力机制将多个向量距离转换为1,并结合头部标识符和尾部标识符对第二文本特征向量进行违规数据分类,生成违规类型数据。
假设基于“在您攒钱享受18%收益的同时,就可以直接享有了,您放心,这个不需要您额外花费一分钱,您可以放心地享有。”的待质检数据得到的二分类数据为违规数据,服务器则在该待质检数据的头部添加头部标识符,在待质检数据的尾部添加尾部标识符,从而生成处理后的待质检数据为“[CLS]在您攒钱享受18%收益的同时,就可以直接享有了,您放心,这个不需要您额外花费一分钱,您可以放心地享有。[SEP]”。其中头部标识符[CLS]为for classification,该头部标识符用于分类的临时标记,尾部标识符[SEP]为for segmentation,该尾部标识符用于分割不同语句的临时标记。服务器将该处理后的待质检数据输入违规数据分类模型中,生成第二文本特征向量,服务器分别读取每两个相邻单词特征向量的向量距离,得到多个向量距离,服务器再结合注意力机制将每个向量距离转换为1,从而能够结合该模型所有层中的左右两侧语境进行分类。结合[CLS]算法的待质检数据生成的第二文本特征向量包括权重,其中权重越大,结合注意力机制对应的注意力越多,例如,“这个”单词文本特征向量对应的权重较小,因此对应的注意力较少,在后面进行分类识别时可以忽略不计该特征向量。最后服务器结合注意力机制对第二文本特征向量进行分类识别,生成违规类型数据“误导不花钱”。在其他实施例中,违规类型数据还可以为误导钱可以随取随用、混淆返还年限等。
206、将违规类型数据传输至目标终端,目标终端为发出待质检数据的终端。
服务器在得到违规类型数据之后,将该违规类型数据传输至发出待质检数据的目标终端。
服务器将违规类型数据传输至目标终端,终端为不同客户的终端,目标终端为发出待质检数据的客户终端,这种质检方式能够打破时间和地域的限制,从而达成与客户交流、查询销售人员行为的目的。
本申请实施例中,利用第一阶段的质检模型生成二分类数据,当二分类数据为违规数据时,再通过第二阶段的质检模型进行违规分类,生成违规类型数据;通过使用两个阶段的质检模型,先识别违规数据,再识别违规数据对应的违规类型数据,使得第二阶段的质检模型能够专注于违规数据的分类,解决了质检数据不均衡的问题,从而提高了质检的准确率。
上面对本申请实施例中分阶段质检的数据分类方法进行了描述,下面对本申请实施例中分阶段质检的数据分类装置进行描述,请参阅图3,本申请实施例中分阶段质检的数据 分类装置一个实施例包括:
质检数据获取模块301,用于获取待质检数据,所述待质检数据为文本数据;
违规数据识别模块302,用于将所述待质检数据输入第一阶段的违规数据识别模型中,生成二分类数据,所述第一阶段的违规数据识别模型为二分类模型;
违规数据分类模块303,用于根据所述二分类数据对所述待质检数据添加头部标识符和尾部标识符,并输入第二阶段的违规数据分类模型中,结合注意力机制进行违规数据分类,生成违规类型数据,所述第二阶段的违规数据分类模型为BERT模型;
传输模块304,用于将所述违规类型数据传输至目标终端,所述目标终端为发出所述待质检数据的终端。
本申请实施例中,利用第一阶段的质检模型生成二分类数据,当二分类数据为违规数据时,再通过第二阶段的质检模型进行违规分类,生成违规类型数据;通过使用两个阶段的质检模型,先识别违规数据,再识别违规数据对应的违规类型数据,使得第二阶段的质检模型能够专注于违规数据的分类,解决了质检数据不均衡的问题,从而提高了质检的准确率。
请参阅图4,本申请实施例中分阶段质检的数据分类装置的另一个实施例包括:
质检数据获取模块301,用于获取待质检数据,所述待质检数据为文本数据;
违规数据识别模块302,用于将所述待质检数据输入第一阶段的违规数据识别模型中,生成二分类数据,所述第一阶段的违规数据识别模型为二分类模型;
违规数据分类模块303,用于根据所述二分类数据对所述待质检数据添加头部标识符和尾部标识符,并输入第二阶段的违规数据分类模型中,结合注意力机制进行违规数据分类,生成违规类型数据,所述第二阶段的违规数据分类模型为BERT模型;
传输模块304,用于将所述违规类型数据传输至目标终端,所述目标终端为发出所述待质检数据的终端。
可选的,违规数据识别模块302包括:
特征提取单元3021,用于将所述待质检数据输入第一阶段的违规数据识别模型,在循环神经网络中对所述待质检数据进行特征提取,生成第一文本特征向量,所述第一阶段的违规数据识别模型为二分类模型;
二分类数据生成单元3022,用于将所述第一文本特征向量输入全连接层中,结合激活函数,生成二分类数据。
可选的,特征提取单元3021还可以具体用于:
将所述待质检数据输入第一阶段的违规数据识别模型中,结合预置的向量空间模型生成文本向量矩阵,所述第一阶段的违规数据识别模型为二分类模型;
将所述文本向量矩阵输入循环神经网络中结合激活函数进行特征提取,生成第一文本特征向量。
可选的,二分类数据生成单元3022还可以具体用于:
将所述第一文本特征向量输入全连接层中进行特征加权,生成文本分类得分;
结合激活函数对所述文本分类得分进行计算,生成目标分类概率,并基于所述目标分类概率确定二分类数据。
可选的,违规数据分类模块303包括:
判断单元3031,用于判断所述二分类数据是否为违规数据;
违规数据分类单元3032,若所述二分类数据为违规数据,则用于对所述待质检数据添加头部标识符和尾部标识符,并输入第二阶段的违规数据分类模型,结合注意力机制进行违规数据分类,生成违规类型数据,所述第二阶段的违规数据分类模型为BERT模型。
可选的,违规数据分类单元3032还可以具体用于:
若所述二分类数据为违规数据,则为所述待质检数据添加头部标识符和尾部标识符,生成处理后的待质检数据;
将所述处理后的待质检数据输入第二阶段的违规数据分类模型,生成第二文本特征向量,所述第二文本特征向量包括多个单词特征向量;
分别读取每两个相邻单词特征向量的向量距离,得到多个向量距离;
结合注意力机制将所述多个向量距离转换为1,并结合所述头部标识符和所述尾部标识符对所述第二文本特征向量进行违规数据分类,生成违规类型数据。
可选的,分阶段质检的数据分类装置还包括:
训练数据获取模块305,用于获取第一阶段训练数据以及第二阶段训练数据,所述第一阶段训练数据为二类标签数据,所述第二阶段训练数据为多类标签数据;
模型训练模块306,用于采用所述第一阶段训练数据进行模型训练,生成第一阶段的违规数据识别模型,并采用所述第二阶段训练数据进行模型训练,生成第二阶段的违规数据分类模型。
本申请实施例中,利用第一阶段的质检模型生成二分类数据,当二分类数据为违规数据时,再通过第二阶段的质检模型进行违规分类,生成违规类型数据;通过使用两个阶段的质检模型,先识别违规数据,再识别违规数据对应的违规类型数据,使得第二阶段的质检模型能够专注于违规数据的分类,解决了质检数据不均衡的问题,从而提高了质检的准确率。
上面图3和图4从模块化功能实体的角度对本申请实施例中的分阶段质检的数据分类装置进行详细描述,下面从硬件处理的角度对本申请实施例中分阶段质检的数据分类设备进行详细描述。
图5是本申请实施例提供的一种分阶段质检的数据分类设备的结构示意图,该分阶段质检的数据分类设备500可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上处理器(central processing units,CPU)510(例如,一个或一个以上处理器)和存储器520,一个或一个以上存储应用程序533或数据532的存储介质530(例如一个或一个以上海量存储设备)。其中,存储器520和存储介质530可以是短暂存储或持久存储。存储在存储介质530的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对分阶段质检的数据分类设备500中的一系列指令操作。更进一步地,处理器510可以设置为与存储介质530通信,在分阶段质检的数据分类设备500上执行存储介质530中的一系列指令操作。
分阶段质检的数据分类设备500还可以包括一个或一个以上电源540,一个或一个以上有线或无线网络接口550,一个或一个以上输入输出接口560,和/或,一个或一个以上操作系统531,例如Windows Serve,Mac OS X,Unix,Linux,FreeBSD等等。本领域技术人员可以理解,图5示出的分阶段质检的数据分类设备结构并不构成对分阶段质检的数据分类设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
本申请还提供一种分阶段质检的数据分类设备,包括:存储器和至少一个处理器,所述存储器中存储有指令,所述存储器和所述至少一个处理器通过线路互连;所述至少一个处理器调用所述存储器中的所述指令,以使得所述分阶段质检的数据分类设备执行上述分阶段质检的数据分类方法中的步骤。
本申请还提供一种计算机可读存储介质,该计算机可读存储介质可以为非易失性计算机可读存储介质,也可以为易失性计算机可读存储介质。计算机可读存储介质存储有计算 机指令,当所述计算机指令在计算机上运行时,使得计算机执行如下步骤:
获取待质检数据,所述待质检数据为文本数据;
将所述待质检数据输入第一阶段的违规数据识别模型中,生成二分类数据,所述第一阶段的违规数据识别模型为二分类模型;
根据所述二分类数据对所述待质检数据添加头部标识符和尾部标识符,并输入第二阶段的违规数据分类模型中,结合注意力机制进行违规数据分类,生成违规类型数据,所述第二阶段的违规数据分类模型为BERT模型;
将所述违规类型数据传输至目标终端,所述目标终端为发出所述待质检数据的终端。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。
Claims (20)
- 一种分阶段质检的数据分类方法,包括:获取待质检数据,所述待质检数据为文本数据;将所述待质检数据输入第一阶段的违规数据识别模型中,生成二分类数据,所述第一阶段的违规数据识别模型为二分类模型;根据所述二分类数据对所述待质检数据添加头部标识符和尾部标识符,并输入第二阶段的违规数据分类模型中,结合注意力机制进行违规数据分类,生成违规类型数据,所述第二阶段的违规数据分类模型为BERT模型;将所述违规类型数据传输至目标终端,所述目标终端为发出所述待质检数据的终端。
- 根据权利要求1所述的分阶段质检的数据分类方法,其中,所述将所述待质检数据输入第一阶段的违规数据识别模型中,生成二分类数据,所述第一阶段的违规数据识别模型为二分类模型包括:将所述待质检数据输入第一阶段的违规数据识别模型,在循环神经网络中对所述待质检数据进行特征提取,生成第一文本特征向量,所述第一阶段的违规数据识别模型为二分类模型;将所述第一文本特征向量输入全连接层中,结合激活函数,生成二分类数据。
- 根据权利要求2所述的分阶段质检的数据分类方法,其中,所述将所述待质检数据输入第一阶段的违规数据识别模型,在循环神经网络中对所述待质检数据进行特征提取,生成第一文本特征向量,所述第一阶段的违规数据识别模型为二分类模型包括:将所述待质检数据输入第一阶段的违规数据识别模型中,结合预置的向量空间模型生成文本向量矩阵,所述第一阶段的违规数据识别模型为二分类模型;将所述文本向量矩阵输入循环神经网络中结合激活函数进行特征提取,生成第一文本特征向量。
- 根据权利要求2所述的分阶段质检的数据分类方法,其中,所述将所述第一文本特征向量输入全连接层中,结合激活函数,生成二分类数据包括:将所述第一文本特征向量输入全连接层中进行特征加权,生成文本分类得分;结合激活函数对所述文本分类得分进行计算,生成目标分类概率,并基于所述目标分类概率确定二分类数据。
- 根据权利要求1所述的分阶段质检的数据分类方法,其中,所述根据所述二分类数据对所述待质检数据添加头部标识符和尾部标识符,并输入第二阶段的违规数据分类模型中,结合注意力机制进行违规数据分类,生成违规类型数据,所述第二阶段的违规数据分类模型为BERT模型包括:判断所述二分类数据是否为违规数据;若所述二分类数据为违规数据,则对所述待质检数据添加头部标识符和尾部标识符,并输入第二阶段的违规数据分类模型,结合注意力机制进行违规数据分类,生成违规类型数据,所述第二阶段的违规数据分类模型为BERT模型。
- 根据权利要求5所述的分阶段质检的数据分类方法,其中,所述若所述二分类数据为违规数据,则对所述待质检数据添加头部标识符和尾部标识符,并输入第二阶段的违规数据分类模型,结合注意力机制进行违规数据分类,生成违规类型数据,所述第二阶段的违规数据分类模型为BERT模型包括:若所述二分类数据为违规数据,则为所述待质检数据添加头部标识符和尾部标识符,生成处理后的待质检数据;将所述处理后的待质检数据输入第二阶段的违规数据分类模型,生成第二文本特征向 量,所述第二文本特征向量包括多个单词特征向量;分别读取每两个相邻单词特征向量的向量距离,得到多个向量距离;结合注意力机制将所述多个向量距离转换为1,并结合所述头部标识符和所述尾部标识符对所述第二文本特征向量进行违规数据分类,生成违规类型数据。
- 根据权利要求1-5中任意一项所述的分阶段质检的数据分类方法,其中,在所述获取待质检数据,所述待质检数据为文本数据之前,所述分阶段质检的数据分类方法还包括:获取第一阶段训练数据以及第二阶段训练数据,所述第一阶段训练数据为二类标签数据,所述第二阶段训练数据为多类标签数据;采用所述第一阶段训练数据进行模型训练,生成第一阶段的违规数据识别模型,并采用所述第二阶段训练数据进行模型训练,生成第二阶段的违规数据分类模型。
- 一种分阶段质检的数据分类设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:获取待质检数据,所述待质检数据为文本数据;将所述待质检数据输入第一阶段的违规数据识别模型中,生成二分类数据,所述第一阶段的违规数据识别模型为二分类模型;根据所述二分类数据对所述待质检数据添加头部标识符和尾部标识符,并输入第二阶段的违规数据分类模型中,结合注意力机制进行违规数据分类,生成违规类型数据,所述第二阶段的违规数据分类模型为BERT模型;将所述违规类型数据传输至目标终端,所述目标终端为发出所述待质检数据的终端。
- 根据权利要求8所述的分阶段质检的数据分类设备,所述处理器执行所述计算机程序时还实现以下步骤:将所述待质检数据输入第一阶段的违规数据识别模型,在循环神经网络中对所述待质检数据进行特征提取,生成第一文本特征向量,所述第一阶段的违规数据识别模型为二分类模型;将所述第一文本特征向量输入全连接层中,结合激活函数,生成二分类数据。
- 根据权利要求9所述的分阶段质检的数据分类设备,所述处理器执行所述计算机程序时还实现以下步骤:将所述待质检数据输入第一阶段的违规数据识别模型中,结合预置的向量空间模型生成文本向量矩阵,所述第一阶段的违规数据识别模型为二分类模型;将所述文本向量矩阵输入循环神经网络中结合激活函数进行特征提取,生成第一文本特征向量。
- 根据权利要求9所述的分阶段质检的数据分类设备,所述处理器执行所述计算机程序时还实现以下步骤:将所述第一文本特征向量输入全连接层中进行特征加权,生成文本分类得分;结合激活函数对所述文本分类得分进行计算,生成目标分类概率,并基于所述目标分类概率确定二分类数据。
- 根据权利要求8所述的分阶段质检的数据分类设备,所述处理器执行所述计算机程序时还实现以下步骤:判断所述二分类数据是否为违规数据;若所述二分类数据为违规数据,则对所述待质检数据添加头部标识符和尾部标识符,并输入第二阶段的违规数据分类模型,结合注意力机制进行违规数据分类,生成违规类型数据,所述第二阶段的违规数据分类模型为BERT模型。
- 根据权利要求12所述的分阶段质检的数据分类设备,所述处理器执行所述计算机程序时还实现以下步骤:若所述二分类数据为违规数据,则为所述待质检数据添加头部标识符和尾部标识符,生成处理后的待质检数据;将所述处理后的待质检数据输入第二阶段的违规数据分类模型,生成第二文本特征向量,所述第二文本特征向量包括多个单词特征向量;分别读取每两个相邻单词特征向量的向量距离,得到多个向量距离;结合注意力机制将所述多个向量距离转换为1,并结合所述头部标识符和所述尾部标识符对所述第二文本特征向量进行违规数据分类,生成违规类型数据。
- 根据权利要求8-13中任意一项所述的分阶段质检的数据分类设备,所述处理器执行所述计算机程序时还实现以下步骤:获取第一阶段训练数据以及第二阶段训练数据,所述第一阶段训练数据为二类标签数据,所述第二阶段训练数据为多类标签数据;采用所述第一阶段训练数据进行模型训练,生成第一阶段的违规数据识别模型,并采用所述第二阶段训练数据进行模型训练,生成第二阶段的违规数据分类模型。
- 一种计算机可读存储介质,所述计算机可读存储介质中存储计算机指令,当所述计算机指令在计算机上运行时,使得计算机执行如下步骤:获取待质检数据,所述待质检数据为文本数据;将所述待质检数据输入第一阶段的违规数据识别模型中,生成二分类数据,所述第一阶段的违规数据识别模型为二分类模型;根据所述二分类数据对所述待质检数据添加头部标识符和尾部标识符,并输入第二阶段的违规数据分类模型中,结合注意力机制进行违规数据分类,生成违规类型数据,所述第二阶段的违规数据分类模型为BERT模型;将所述违规类型数据传输至目标终端,所述目标终端为发出所述待质检数据的终端。
- 根据权利要求15所述的计算机可读存储介质,当所述计算机指令在计算机上运行时,使得计算机还执行以下步骤:将所述待质检数据输入第一阶段的违规数据识别模型,在循环神经网络中对所述待质检数据进行特征提取,生成第一文本特征向量,所述第一阶段的违规数据识别模型为二分类模型;将所述第一文本特征向量输入全连接层中,结合激活函数,生成二分类数据。
- 根据权利要求16所述的计算机可读存储介质,当所述计算机指令在计算机上运行时,使得计算机还执行以下步骤:将所述待质检数据输入第一阶段的违规数据识别模型中,结合预置的向量空间模型生成文本向量矩阵,所述第一阶段的违规数据识别模型为二分类模型;将所述文本向量矩阵输入循环神经网络中结合激活函数进行特征提取,生成第一文本特征向量。
- 根据权利要求16所述的计算机可读存储介质,当所述计算机指令在计算机上运行时,使得计算机还执行以下步骤:将所述第一文本特征向量输入全连接层中进行特征加权,生成文本分类得分;结合激活函数对所述文本分类得分进行计算,生成目标分类概率,并基于所述目标分类概率确定二分类数据。
- 根据权利要求15所述的计算机可读存储介质,当所述计算机指令在计算机上运行时,使得计算机还执行以下步骤:判断所述二分类数据是否为违规数据;若所述二分类数据为违规数据,则对所述待质检数据添加头部标识符和尾部标识符,并输入第二阶段的违规数据分类模型,结合注意力机制进行违规数据分类,生成违规类型数据,所述第二阶段的违规数据分类模型为BERT模型。
- 一种分阶段质检的数据分类装置,所述分阶段质检的数据分类装置包括:获取模块,用于获取待质检数据,所述待质检数据为文本数据;违规数据识别模块,用于将所述待质检数据输入第一阶段的违规数据识别模型中,生成二分类数据,所述第一阶段的违规数据识别模型为二分类模型;违规数据分类模块,用于根据所述二分类数据对所述待质检数据添加头部标识符和尾部标识符,并输入第二阶段的违规数据分类模型中,结合注意力机制进行违规数据分类,生成违规类型数据,所述第二阶段的违规数据分类模型为BERT模型;传输模块,用于将所述违规类型数据传输至目标终端,所述目标终端为发出所述待质检数据的终端。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011538857.0A CN112668857A (zh) | 2020-12-23 | 2020-12-23 | 分阶段质检的数据分类方法、装置、设备及存储介质 |
CN202011538857.0 | 2020-12-23 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022134591A1 true WO2022134591A1 (zh) | 2022-06-30 |
Family
ID=75408697
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/109696 WO2022134591A1 (zh) | 2020-12-23 | 2021-07-30 | 分阶段质检的数据分类方法、装置、设备及存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN112668857A (zh) |
WO (1) | WO2022134591A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117391515A (zh) * | 2023-10-24 | 2024-01-12 | 科讯嘉联信息技术有限公司 | 一种基于通用大语言模型的服务质量管理方法与系统 |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112668857A (zh) * | 2020-12-23 | 2021-04-16 | 深圳壹账通智能科技有限公司 | 分阶段质检的数据分类方法、装置、设备及存储介质 |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108419091A (zh) * | 2018-03-02 | 2018-08-17 | 北京未来媒体科技股份有限公司 | 一种基于机器学习的视频内容审核方法及装置 |
CN111178410A (zh) * | 2019-12-19 | 2020-05-19 | 腾讯科技(深圳)有限公司 | 违规图片的识别方法及装置 |
CN111225234A (zh) * | 2019-12-23 | 2020-06-02 | 广州市百果园信息技术有限公司 | 视频审核方法、视频审核装置、设备和存储介质 |
CN111738011A (zh) * | 2020-05-09 | 2020-10-02 | 完美世界(北京)软件科技发展有限公司 | 违规文本的识别方法及装置、存储介质、电子装置 |
CN111860377A (zh) * | 2020-07-24 | 2020-10-30 | 中国平安人寿保险股份有限公司 | 基于人工智能的直播方法、装置、电子设备及存储介质 |
CN111883115A (zh) * | 2020-06-17 | 2020-11-03 | 马上消费金融股份有限公司 | 语音流程质检的方法及装置 |
US10833960B1 (en) * | 2019-09-04 | 2020-11-10 | International Business Machines Corporation | SLA management in composite cloud solutions using blockchain |
CN112668857A (zh) * | 2020-12-23 | 2021-04-16 | 深圳壹账通智能科技有限公司 | 分阶段质检的数据分类方法、装置、设备及存储介质 |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107705807B (zh) * | 2017-08-24 | 2019-08-27 | 平安科技(深圳)有限公司 | 基于情绪识别的语音质检方法、装置、设备及存储介质 |
CN109815487B (zh) * | 2018-12-25 | 2023-04-18 | 平安科技(深圳)有限公司 | 文本质检方法、电子装置、计算机设备及存储介质 |
CN110288192A (zh) * | 2019-05-23 | 2019-09-27 | 平安科技(深圳)有限公司 | 基于多个质检模型的质检方法、装置、设备及存储介质 |
CN111241287A (zh) * | 2020-01-16 | 2020-06-05 | 支付宝(杭州)信息技术有限公司 | 用于生成对抗文本的生成模型的训练方法及装置 |
CN111444340B (zh) * | 2020-03-10 | 2023-08-11 | 腾讯科技(深圳)有限公司 | 文本分类方法、装置、设备及存储介质 |
CN111460162B (zh) * | 2020-04-11 | 2021-11-02 | 科技日报社 | 一种文本分类方法、装置、终端设备及计算机可读存储介质 |
CN111538809B (zh) * | 2020-04-20 | 2021-03-16 | 马上消费金融股份有限公司 | 一种语音服务质量检测方法、模型训练方法及装置 |
CN111553488B (zh) * | 2020-07-10 | 2020-10-20 | 支付宝(杭州)信息技术有限公司 | 一种针对用户行为的风险识别模型训练方法及系统 |
CN112069313A (zh) * | 2020-08-12 | 2020-12-11 | 北京工业大学 | 一种基于bert与双向lstm、注意力机制融合的灾难信息博文分类方法 |
CN112084764B (zh) * | 2020-09-02 | 2022-06-17 | 北京字节跳动网络技术有限公司 | 数据检测方法、装置、存储介质及设备 |
CN112085012B (zh) * | 2020-09-04 | 2024-03-08 | 泰康保险集团股份有限公司 | 项目名称和类别识别方法及装置 |
-
2020
- 2020-12-23 CN CN202011538857.0A patent/CN112668857A/zh active Pending
-
2021
- 2021-07-30 WO PCT/CN2021/109696 patent/WO2022134591A1/zh active Application Filing
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108419091A (zh) * | 2018-03-02 | 2018-08-17 | 北京未来媒体科技股份有限公司 | 一种基于机器学习的视频内容审核方法及装置 |
US10833960B1 (en) * | 2019-09-04 | 2020-11-10 | International Business Machines Corporation | SLA management in composite cloud solutions using blockchain |
CN111178410A (zh) * | 2019-12-19 | 2020-05-19 | 腾讯科技(深圳)有限公司 | 违规图片的识别方法及装置 |
CN111225234A (zh) * | 2019-12-23 | 2020-06-02 | 广州市百果园信息技术有限公司 | 视频审核方法、视频审核装置、设备和存储介质 |
CN111738011A (zh) * | 2020-05-09 | 2020-10-02 | 完美世界(北京)软件科技发展有限公司 | 违规文本的识别方法及装置、存储介质、电子装置 |
CN111883115A (zh) * | 2020-06-17 | 2020-11-03 | 马上消费金融股份有限公司 | 语音流程质检的方法及装置 |
CN111860377A (zh) * | 2020-07-24 | 2020-10-30 | 中国平安人寿保险股份有限公司 | 基于人工智能的直播方法、装置、电子设备及存储介质 |
CN112668857A (zh) * | 2020-12-23 | 2021-04-16 | 深圳壹账通智能科技有限公司 | 分阶段质检的数据分类方法、装置、设备及存储介质 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117391515A (zh) * | 2023-10-24 | 2024-01-12 | 科讯嘉联信息技术有限公司 | 一种基于通用大语言模型的服务质量管理方法与系统 |
CN117391515B (zh) * | 2023-10-24 | 2024-06-07 | 科讯嘉联信息技术有限公司 | 一种基于通用大语言模型的服务质量管理方法与系统 |
Also Published As
Publication number | Publication date |
---|---|
CN112668857A (zh) | 2021-04-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022134591A1 (zh) | 分阶段质检的数据分类方法、装置、设备及存储介质 | |
CN110163478A (zh) | 一种合同条款的风险审查方法及装置 | |
CN112434535B (zh) | 基于多模型的要素抽取方法、装置、设备及存储介质 | |
CN105518656A (zh) | 用于多传感器数据融合的认知神经语言学行为辨识系统 | |
US20220230089A1 (en) | Classifier assistance using domain-trained embedding | |
US20220108318A1 (en) | Quantum computing based real-time verification system | |
CN113949582A (zh) | 一种网络资产的识别方法、装置、电子设备及存储介质 | |
Madireddy | Content Based Image Classification Using Support Vector Machine Algorithm | |
CN113705909A (zh) | 基于预测模型的风险等级预测方法、装置与存储介质 | |
Hanshal et al. | RETRACTED ARTICLE: Hybrid deep learning model for automatic fake news detection | |
Ding et al. | Jel: applying end-to-end neural entity linking in jpmorgan chase | |
CN110097258B (zh) | 一种用户关系网络建立方法、装置及计算机可读存储介质 | |
CN117313138A (zh) | 基于nlp的社交网络隐私感知系统及方法 | |
Torres-Berru et al. | Data and text mining for the detection of fraud in public contracts: A case study of Ecuador’s official public procurement system | |
CN116821455A (zh) | 基于社交工具的区域数据回溯分析方法及系统 | |
CN116186298A (zh) | 信息检索方法和装置 | |
CN113656466B (zh) | 保单数据查询方法、装置、设备及存储介质 | |
CN113312481B (zh) | 基于区块链的文本分类方法、装置、设备以及存储介质 | |
CN113191777A (zh) | 风险识别方法和装置 | |
CN117221839B (zh) | 5g信令识别方法及其系统 | |
Balakrishna et al. | Identifying spammer groups in consumer reviews using meta-data via bipartite graph approach | |
US11892986B2 (en) | Activated neural pathways in graph-structured data models | |
CN112270179B (zh) | 一种实体识别方法、装置及电子设备 | |
CN116542251B (zh) | 一种基于智慧校园的网络监管方法及系统 | |
CN116719942B (zh) | 数据资产分类方法、装置、计算机设备和计算机存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21908596 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 27.10.2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21908596 Country of ref document: EP Kind code of ref document: A1 |