CN116384512A

CN116384512A - Method, model training method, medium and device suitable for screening specific users

Info

Publication number: CN116384512A
Application number: CN202310618428.1A
Authority: CN
Inventors: 陈鹏鹄
Original assignee: Fujian Hongchuang Technology Information Co ltd
Current assignee: Fujian Hongchuang Technology Information Co ltd
Priority date: 2023-05-30
Filing date: 2023-05-30
Publication date: 2023-07-04
Anticipated expiration: 2043-05-30
Also published as: CN116384512B

Abstract

The invention relates to a method, a model training method, a storage medium and an electronic device suitable for screening specific users, wherein the method comprises the following steps: obtaining chat sample data, and respectively inputting the chat sample data into a training model, wherein the chat sample data comprises chat message sample data and user identifications for transmitting chat information; training the training model by adopting the chat sample data, judging the probability that the user identification of the chat message sample data is a specific user identification, and outputting the probability; and calculating the probabilities output by all the models by adopting a preset strategy to obtain a final calculation result, wherein the final calculation result comprises the final probability that each user identifier is a specific user identifier. Therefore, the probability of output of the multiple models is calculated through the preset strategy, a final calculation result is obtained, and compared with a single model, the deviation caused by the calculation result of the single model can be effectively reduced, so that the output result is more accurate.

Description

Method, model training method, medium and device suitable for screening specific users

Technical Field

The invention relates to the field of artificial intelligence, in particular to a method, a model training method, a storage medium and electronic equipment suitable for screening specific users.

Background

The chat record contains a great amount of information, and some attribute information of the user can be mined through the chat information, so that whether the user is a specific user with a certain type of target is judged. At present, the specific user is identified to be judged by manually checking the chat records, which is time-consuming and labor-consuming and has huge cost. With the current rapid advance of artificial intelligence, particularly the tremendous progress in natural language processing, the use of computer automation to discover specific users has become of practical significance.

The text classification is an important item in the field of artificial intelligence, and the text classification refers to automatic classification marking of text information according to a certain classification standard by a computer. With the development of the internet, the information content has exploded, and manual annotation data has become time-consuming, low in quality and easily affected by subjective consciousness of an annotator. Therefore, the realization of machine automation becomes practical, repeated and boring text labeling tasks are transmitted to a computer for processing, the problems can be effectively overcome, and meanwhile, the labeled data has the characteristics of consistency, high quality and the like. The text classification has a plurality of application scenes including part-of-speech tagging, emotion analysis, intention recognition, topic classification, question-answering tasks, natural language reasoning and the like.

After the existing single training model is trained, better classification cannot be achieved on specific users based on chat data, and the defect of low recognition accuracy exists.

Disclosure of Invention

In view of the above problems, the invention provides a method, a model training method, a storage medium and electronic equipment suitable for screening specific users, which solve the problem that the accuracy of the existing deep learning model identification is low when the classification of the specific users is realized based on chat data.

To achieve the above object, in a first aspect, the present invention provides a model training method suitable for screening specific users, the method comprising the steps of:

the method comprises the steps of obtaining chat sample data, respectively inputting the chat sample data into a training model, wherein the chat sample data comprises chat message sample data and user identifications for transmitting chat information, the training model comprises a primary learner, and the primary learner comprises at least two of a BERT model, a ROBERTA model, an ERNIE model, an ELECTRA model and an ALBERT model;

training the training model by adopting the chat sample data, judging the probability that the user identification of the chat message sample data is a specific user identification, and outputting the probability;

And calculating the probabilities output by all the models by adopting a preset strategy to obtain a final calculation result, wherein the final calculation result comprises the final probability that each user identifier is a specific user identifier.

As an optional embodiment, the calculating the probabilities of all model outputs by using a predetermined strategy, and outputting the final calculation result includes:

acquiring weight influence factors of each training model;

and carrying out weighted operation on the probability output by all the models based on the weight influence factors of the training models to obtain a final calculation result.

As an alternative embodiment, the training model further includes a secondary learner, and the chat sample data is divided into positive sample data and negative sample data according to user identification types;

the training of the training model using the chat sample data further comprises:

acquiring a part of the positive sample data and a part of the negative sample data to obtain a first training data set, and inputting the first training set to each primary learner to train to obtain a plurality of predicted positive sample data;

the calculating of the probability of all the model outputs by adopting the preset strategy, and the obtaining of the final calculation result comprises the following steps:

Combining the predicted positive sample data with the first training data set to obtain a second training data set, and inputting the second training data set into the secondary learner to train until the duty ratio of the negative sample data which can be recognized by the secondary learner exceeds a preset proportion, so as to obtain a predicted result;

and calculating the probability of all the model outputs based on the model weights corresponding to the two-stage learner after obtaining the prediction result, so as to obtain a final calculation result.

As an alternative embodiment, the first level learner is a BERT model, a ROBERTA model, an ERNIE model, and an ELECTRA model, and the second level learner is an MLP model.

and sequentially judging whether the probability output by all the models exceeds a preset probability threshold, if the number of the models with the output probability exceeding the preset probability threshold is larger than or equal to the number of the models with the output probability not exceeding the preset probability threshold, setting the final calculation result as 1, otherwise, setting the final calculation result as 0.

As an alternative embodiment, the chat sample data is further processed and input into the training model according to the following manner:

Judging whether the chat message sample data corresponding to the user identifier in certain chat sample data is smaller than the preset data amount, if so, acquiring other chat message sample data corresponding to the current user identifier, and splicing the other chat message sample data with the chat message sample data in the current chat sample data until the spliced chat message sample data amount reaches the preset data amount.

In a second aspect, the present invention provides a method for screening a specific user, the method comprising the steps of:

obtaining chat record information of a user to be tested, and inputting the chat record information into a training model after training, wherein the training model is trained according to the model training method according to the first aspect of the invention;

and outputting the probability that the current user is a specific user.

As an alternative embodiment, the chat log information includes a plurality of chat logs, the method comprising:

the training model after the training is completed calculates a score for each chat record, and the score corresponds to the probability that the current user is a specific user;

the probability that the current user is a specific user is calculated based on the scores and weights of all chat records, and the weights of the chat records are determined according to the distance between the speaking time of the chat record and the current time stamp.

In a third aspect, the present invention also provides a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the method of the first or second aspect.

In a fourth aspect, the present invention also provides an electronic device comprising a memory and a processor, the memory for storing one or more computer program instructions, wherein the one or more computer program instructions are executable by the processor to implement the method as described in the first or second aspect.

Unlike the prior art, the above technical solution relates to a method for screening specific users, a model training method, a storage medium and an electronic device, where the method includes: obtaining chat sample data, and respectively inputting the chat sample data into a training model, wherein the chat sample data comprises chat message sample data and user identifications for transmitting chat information; training the training model by adopting the chat sample data, judging the probability that the user identification of the chat message sample data is a specific user identification, and outputting the probability; and calculating the probabilities output by all the models by adopting a preset strategy to obtain a final calculation result, wherein the final calculation result comprises the final probability that each user identifier is a specific user identifier. Therefore, the probability of output of the multiple models is calculated through the preset strategy, a final calculation result is obtained, and compared with a single model, the deviation caused by the calculation result of the single model can be effectively reduced, so that the output result is more accurate.

The foregoing summary is merely an overview of the present invention, and may be implemented according to the text and the accompanying drawings in order to make it clear to a person skilled in the art that the present invention may be implemented, and in order to make the above-mentioned objects and other objects, features and advantages of the present invention more easily understood, the following description will be given with reference to the specific embodiments and the accompanying drawings of the present invention.

Drawings

The drawings are only for purposes of illustrating the principles, implementations, applications, features, and effects of the present invention and are not to be construed as limiting the invention.

In the drawings of the specification:

FIG. 1 is a flow chart of a model training method suitable for screening a particular user according to a first exemplary embodiment of the present invention;

FIG. 2 is a flow chart of a model training method suitable for screening specific users in accordance with a second exemplary embodiment of the present invention;

FIG. 3 is a flow chart of a model training method suitable for screening specific users in accordance with a third exemplary embodiment of the present invention;

FIG. 4 is a flow chart of a method for screening a particular user according to an exemplary embodiment of the present invention;

FIG. 5 is a flow chart of a method for screening a particular user according to another exemplary embodiment of the present invention;

FIG. 6 is a schematic diagram of a BERT model according to an exemplary embodiment of the invention;

FIG. 7 is a schematic diagram of an ELECTRA model according to an exemplary embodiment of the present invention;

FIG. 8 is a schematic diagram of an ERNIE model according to an exemplary embodiment of the invention;

FIG. 9 is a schematic diagram of an ALBERT model according to an exemplary embodiment of the present invention;

FIG. 10 is a schematic diagram of a ROBERTA model according to an exemplary embodiment of the present invention;

fig. 11 is a schematic diagram of an electronic device according to an exemplary embodiment of the present invention.

Reference numerals referred to in the above drawings are explained as follows:

1. an electronic device;

11. a memory;

12. a processor.

Description of the embodiments

In order to describe the possible application scenarios, technical principles, practical embodiments, and the like of the present invention in detail, the following description is made with reference to the specific embodiments and the accompanying drawings. The embodiments described herein are only for more clearly illustrating the technical aspects of the present invention, and thus are only exemplary and not intended to limit the scope of the present invention.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the invention. The appearances of the phrase "in various places in the specification are not necessarily all referring to the same embodiment, nor are they particularly limited to independence or relevance from other embodiments. In principle, in the present invention, as long as there is no technical contradiction or conflict, the technical features mentioned in each embodiment may be combined in any manner to form a corresponding implementable technical solution.

Unless defined otherwise, technical terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the present invention pertains; the use of related terms herein is for the purpose of describing particular embodiments only and is not intended to limit the invention.

In the description of the present invention, the term "and/or" is a representation for describing a logical relationship between objects, which means that three relationships may exist, for example a and/or B, representing: there are three cases, a, B, and both a and B. In addition, the character "/" herein generally indicates that the front-to-back associated object is an "or" logical relationship.

In the present invention, terms such as "first" and "second" are used merely to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any actual number, order, or sequence of such entities or operations.

Without further limitation, the use of the terms "comprising," "including," "having," or other like open-ended terms in this application are intended to cover a non-exclusive inclusion, such that a process, method, or article of manufacture that comprises a list of elements does not include additional elements in the process, method, or article of manufacture, but may include other elements not expressly listed or inherent to such process, method, or article of manufacture.

As in the understanding of "review guidelines," the expressions "greater than", "less than", "exceeding" and the like are understood to exclude this number in the present invention; the expressions "above", "below", "within" and the like are understood to include this number. Furthermore, in the description of embodiments of the present invention, the meaning of "a plurality of" is two or more (including two), and similarly, the expression "a plurality of" is also to be understood as such, for example, "a plurality of" and the like, unless specifically defined otherwise.

Referring to fig. 1, a flowchart of a model training method for screening a specific user according to a first exemplary embodiment of the present invention includes the following steps:

firstly, entering a step S1 to obtain chat sample data, and respectively inputting the chat sample data into a training model, wherein the chat sample data comprises chat message sample data and user identifications for transmitting chat messages;

step S2 is then carried out to train the training model by adopting the chat sample data, and the probability that the user identification of the chat message sample data is a specific user identification is judged and output;

and then, step S3 is carried out to calculate the probability output by all the models by adopting a preset strategy, and a final calculation result is obtained, wherein the final calculation result comprises the final probability that each user identifier is a specific user identifier.

In this embodiment, the training model includes a primary learner, where the primary learner includes at least two of a BERT model, a ROBERTA model, an ERNIE model, an electrora model, and an AL BERT model, and the above 5 models are developed one by one in conjunction with fig. 6 to 10 of the present application:

the first model is the BERT model, which uses a bi-directional transducer structure for feature extraction, as shown in fig. 6, which greatly improves the performance of the model. The encoder of the model consists of a stack of N identical layers, each layer has two sublayers, the first sublayer is a multi-head self-care mechanism layer, the second sublayer is a fully connected feedforward neural network processed according to the position, and residual connection is adopted in the two sublayers, and then layer normalization processing is carried out.

In the model training process, firstly, a trained word embedding model BERT-base is used as a word vector to be sent into a model, word embedding representation of a user speaking text is obtained after the user speaking text is trained, the representation comprises context information, word sequence information, grammar semantic information and deep model structure information of the user speaking text, time sequence features and local related features of the text are fully extracted through a neural network and fused, and finally chat text of a specific user (such as a user with potential fraud threat) is classified through a Softmax layer.

In fig. 6, the pre-training phase refers to training the model in a case where a problem to be solved is relatively simple but the data amount is large, and the fine tuning phase refers to migrating the model to other tasks based on the pre-training phase. Preferably, the application migrates the model with the pre-training stage completed to the discovery task of the specific user, and continuously adjusts the weight of the model based on the parameters obtained in the pre-training stage.

In fig. 6, caption t represents Token (which may be a character or a word), and the subscript represents its position in the sentence.

S represents Segment (for indicating which sentence the Token is in), and the subscript represents a sentence identification, i.e. indicates which sentence the Token belongs to.

E represents Embedding (which is a vector we get by word Embedding), and the subscript has a meaning similar to the letter t mentioned earlier, indicating its position in the sentence.

H represents hidden (hidden layer, meaning the vector of the model after multi-layer processing), and the subscript has a meaning similar to the letter t mentioned above, indicating its position in the sentence.

A represents a vector formed by combining character embedding, partial embedding and position embedding.

The second model is the ELECTRA model, which, as shown in FIG. 7, consists of a generator G and a discriminator D. The generator G is a small MLM that predicts the original word at the MASK location and will be used to replace the input text with a partial word. The discriminator D is used to determine whether each word in the input sentence is replaced, i.e., using a Replaced Token Detection (RTD) pre-training task, instead of BERT original Masked Language Model (MLM), and without using a Next Sentence Prediction (NSP) task. The model structure of the ROBERTA model is similar to BERT, and is also a transformer-based encoder structure, consisting of a stack of N identical encoding layers. Each coding layer comprises: the multi-head self-attention mechanism layer, the full-connection feedforward neural network, the residual connection and normalization processing layer. After the end of the pre-training phase, only the discriminator D is used to do the base model of the classification task. During training, text classification is performed on the ELECTRA model by using the user speaking text, a classification model is obtained through learning, and finally, the specific user and the non-specific user are classified in the test data set.

As shown in fig. 7, [ mask ] is a mask, and the decoder of the transducer does not know the word next to the current word when predicting the word, so the word is replaced with the mask. The ELECTRO model is generally composed of two parts, a generator and a discriminator, respectively, both employing the encoder structure of the transducer model.

In FIG. 7, the generator is first used to train a model (i.e., generator) to partially MASK the input statement, causing the model to predict the masked character, resulting in a corrected token (i.e., corrupted character). For example, in FIG. 7, the generator randomly masks two tokens, "that bit" and "boiled," respectively, and the generator predicts that the corresponding coupled token (i.e., the corrupted character) was obtained, where the "that bit" prediction was successful and the "boiled" became "eaten.

Next, the input of the discriminator is the output after the generator breaks down (corrupt) the characters, as in fig. 7, "that cook eat a meal", the role of the discriminator is to distinguish whether each character input is original or replaced, and the discriminator judges that it is replaced.

The third model is an ERNIE model, which is a method for enhancing the BERT language characterization model by knowledge graph, as shown in fig. 8, and the model does not directly input external knowledge information, but implicitly learns knowledge information such as entity relationship, entity attribute, etc. by changing the masking policy. For the underlying BERT model, the masking policies of the model are word-based, and such policies are detrimental to learning knowledge information, particularly chinese language models. In ERNIE, the model may mask off some consecutive tokens so that the model can learn word-to-word relationships in addition to word-to-word relationships.

ERNIE uses a multi-layer transducer as the basic encoder, which can capture the context information of each tag in a sentence through a self-attention mechanism and generate a series of context embedded sequences. For a Chinese corpus, a space is added around each character in the CJK Unicode range, and WordPieces is used to tag Chinese sentences. For a given token, its input representation is constructed by embedding the corresponding token, segment, and location into the accumulation. The first tag of each sequence is a special class insert.

A significant improvement of the ERNIE model over the traditional BERT model is the change of the mask strategy, such as the input statement "halibut is a series of highlights, the author is j.k. Tourmaline" in fig. 8. The mask strategy of the traditional BERT model may mask the proper nouns or adjectives such as 'series', 'wonderful', 'tourmaline', etc., while the ERNIE model greatly enhances the representation capability of the general semantics by uniformly modeling the lexical structure, the grammatical structure and the semantic information in the training data.

The fourth model is an ALBERT model, as shown in fig. 9, and in order to solve the problems of huge parameter amount and too long training time of the previous model, the ALBERT model improves the traditional BERT architecture by decomposing embedded parameters and sharing cross-layer parameters.

The original BERT has the same embedded layer size as the hidden layer size, and compared with the traditional BERT, ALBERT unbinds the embedded layer size and the hidden layer size, which keeps the hidden layer size unchanged, but adds a dimension transformation matrix after ebedding. In addition, ALBERT uses parameter sharing such that multi-layer coating becomes a superposition of one-layer coating. The use of the GELU activation function promotes model stability and can stabilize network parameters.

The natural language model converts an input sentence into a vector through an embedding layer (embedding layer), and the ALBERT model used herein converts each word in the sentence into a single-hot vector. The one-hot vector refers to encoding N states using N bits 0 or 1, each state having its own representation, and where only one bit is a 1 and the other bits are 0.

From a model perspective, token embedding learns a context-independent representation (dimension E), i.e., context-independent character embedding. The Hidden Layer learns about the context-dependent representation (dimension H), i.e. the Hidden Layer embedded representation is context-dependent.

The ALBERT model decomposes an embedded matrix into smaller matrices using an embedded layer parametric decomposition method. The token Embedding and the Hidden-layer Embedding are separated. For example, the vocabulary size is V, the vector dimension is E, the hidden layer vector is H, then the original vocabulary vector parameter size is v×h, ALBERT maps the original ebedding to v×e (independent of the context representation) and then to the hidden space H (dependent of the context representation).

The fifth model is the ROBERTA model, as shown in fig. 10, whose model structure is similar to BERT, and which is modified in terms of pretraining strategy by removing the next sentence prediction task in BERT, changing the random mask to dynamic mask and using byte pair coding. The pre-training process is depicted in fig. 10, specifically masking a portion of the input vocabulary or characters to allow the model to predict what the masked portion was.

According to the scheme, a plurality of models (any two to five of the 5 models can be used) are trained based on the processed corpus information (i.e. chat sample data) and the labels, a primary learner is taken as an example of the 5 models, the trained 5 models are used for predicting the user identification through the chat corpus, then the output results of the models are synthesized to obtain the final calculation result in an integrated learning mode, and therefore a specific user is predicted. Compared with a single model training prediction mode, the scheme can effectively improve the accuracy of specific user prediction.

The integrated learning is to train a plurality of individual learners for training set data, complete learning tasks through a certain combination strategy, and finally form a strong learner. The method for constructing the integrated learning mainly comprises the following two methods: (1) the sequence method comprises the following steps: the learners are sequentially generated serially, strong dependency exists among individual learners, and each learner is heterogeneous. (2) The parallel method comprises the following steps: a plurality of independent learners are built, each learner has no strong dependency relationship, and a series of individual learners can be generated in parallel, and are usually homogeneous weak learners.

As shown in fig. 2, in some embodiments, the calculating the probabilities of all model outputs using the predetermined strategy, and outputting the final calculation result includes: step S201, obtaining weight influence factors of each training model; step S202 carries out weighting operation on the probabilities output by all models based on the weight influence factors of the training models, and a final calculation result is obtained. For example, the weight influence factors of the 5 models are 0.1, 0.2, 0.3, 0.2 and 0.2,5, and the probabilities that the user identifications output independently are the specific user identifications when the models are trained are 0.7, 0.6, 0.5, 0.9 and 0.8 respectively, and the final probabilities (namely the final calculation results) that the current user is the specific user can be obtained through weighting operation: 0.1×0.7+0.2×0.6+0.3×0.5+0.2×0.9+0.3×0.8=0.79.

Further, the final calculation result obtained by calculation through the method shown in fig. 2 is further used for comparing with a preset probability threshold, if the final calculation result is greater than the preset probability threshold, the current user can be identified as a specific user, otherwise, the current user is identified as an unspecific user.

In other embodiments, the calculating the probabilities of all model outputs using the predetermined strategy, and outputting the final calculation result includes: and sequentially judging whether the probability output by all the models exceeds a preset probability threshold, if the number of the models with the output probability exceeding the preset probability threshold is larger than or equal to the number of the models with the output probability not exceeding the preset probability threshold, setting the final calculation result as 1, otherwise, setting the final calculation result as 0.

In short, the learners of different levels give respective output results (between 0 and 1) to the same input sample data, then the output results are sequentially judged, if the output results are smaller than a preset probability threshold (e.g. 0.5), the learner is marked as 0, and otherwise the learner is marked as 1. If the number of the output result marks 1 is greater than or equal to half of the number of the individual learners, the final decision result is 1 (true), namely, the current user identification is judged to be the specific user identification, otherwise, the decision result is 0 (false), namely, the current user identification is judged to be the non-specific user identification.

As shown in fig. 3, the training model further includes a secondary learner, and the chat sample data is divided into positive sample data and negative sample data according to user identification types;

firstly, entering step S301 to acquire a part of the positive sample data and a part of the negative sample data to obtain a first training data set, and inputting the first training set to each first-stage learner to perform training to obtain a plurality of predicted positive sample data;

step S302: combining the predicted positive sample data with the first training data set to obtain a second training data set, and inputting the second training data set into the secondary learner to train until the duty ratio of the negative sample data which can be recognized by the secondary learner exceeds a preset proportion, so as to obtain a predicted result;

step S303: and calculating the probability of all the model outputs based on the model weights corresponding to the two-stage learner after obtaining the prediction result, so as to obtain a final calculation result.

In this embodiment, the positive sample data refers to part of sample data associated with a specific user, that is, the user who sends a message currently can be inferred to be the specific user through learning and judging of the positive sample data, whereas the negative sample data refers to part of sample data associated with a non-specific user, that is, the user who sends a message currently can be inferred to be the non-specific user through learning and judging of the positive sample data.

In this embodiment, the preset proportion may be set according to actual needs, and in order to further improve accuracy of model training, the preset proportion may be set to be more than 90%.

In this embodiment, the secondary learner is an MLP (multi-layer perceptron) model. The network architecture of the secondary learner is a multi-layer feedforward neural network, each layer of neurons are fully interconnected with the next layer of neurons, no same-layer connection exists between the neurons, and no cross-layer connection exists. The learning process of the neural network is to adjust the connection weight between the neurons and the threshold value of each functional neuron according to the training data.

In this embodiment, the weights refer to parameters in the model, including trainable and untrainable, for example, the MLP model used by the secondary learner includes four layers in total, when data is input into the first layer of the MLP model, the weight w1 is input into the second layer after training, and similarly, the weight w2 is obtained after data is input into the second layer for training, and then w2 is input into the third layer for training, and similarly, a final preferred weight is obtained after training is finished. Because the features of the second-stage learner are from the learning of the first training set, more accurate prediction results can be obtained, and all models are calculated by the model weights corresponding to the prediction results, so that a final calculation result is output.

In the training phase, the training set (i.e., the secondary training set) input to the secondary learner is further generated based on the training set (i.e., the primary training set) input to the primary learner, but is different from the training set input to the primary learner because the risk of overfitting is relatively large if the secondary training set is generated directly using the training set of the primary learner. The secondary learner is a model that predicts based on the prediction results of the primary learner, so the secondary training set is formed by the primary learner adding the labels of the initial training set to the prediction results of the initial training set. The MLP can obtain the weight distribution of the predicted result of the primary learner through the training of the secondary training set, so that a more accurate result is obtained in the test stage. In the test stage, a first-level learner can be used for predicting the test set to form a new test set, namely, an input sample of a second-level learner (MLP) is obtained, and then the trained MLP is used for carrying out secondary prediction on the prediction result of the test set based on the first-level learner, so that the obtained secondary prediction result is more accurate and more robust.

Assuming that 4 primary learners (a ROBERTA model, an ERNIE model, an ELECTRA model and a BERT model are respectively selected as a weak supervision model, the application enables the 4 primary learners to be integrated as a strong supervision model through an integrated learning mode, and bagging, boosting, stacking and the like can be adopted in the integrated learning method.

Preferably, in this embodiment, the learning is performed by a stacking method. In general, the first layer model in Stacking uses a model with high fitting degree to pursue full learning of training data. Because the different models differ in principle from the training set, the first layer model can be considered as a process of automatically extracting valid features from the raw data. In the first layer model, stacking is more prone to over-fitting due to the complex nonlinear variation extraction features used, as the features of the second layer come from learning the first layer data. In order to reduce the risk of overfitting, the second-layer model tends to use a simple model, nor does the features of the second-layer data include the original features. From the above analysis, it can be seen that the key of successful Stacking is that the first layer model can obtain the output value with good predictive capability and variability (low correlation) for the original training data, so that after further learning through the second layer model, the first layer models can make up for the shortages, and the accuracy and stability of prediction can be improved. Thus, the present application identifies a stacking method with the ROBERTA model, ERNIE model, ELECTRA model, and BERT model as weak learners, and the MLP as a secondary learner.

In some embodiments, the chat sample data is further processed and input into the training model according to the following: judging whether the chat message sample data corresponding to the user identifier in certain chat sample data is smaller than the preset data amount, if so, acquiring other chat message sample data corresponding to the current user identifier, and splicing the other chat message sample data with the chat message sample data in the current chat sample data until the spliced chat message sample data amount reaches the preset data amount.

Therefore, when the chat sample data input to the model is smaller, the chat sample data of all utterances of the same user can be spliced together in a text splicing mode, so that the information quantity is expanded, and the accuracy of model training is further improved.

As shown in fig. 4, a flowchart of a method for screening a specific user according to an exemplary embodiment of the present invention includes the following steps:

firstly, entering step S401 to acquire chat log information of a user to be tested, and inputting the chat log information into a training model after training, wherein the training model is trained according to the model training method according to the first aspect of the application;

Then, step S402 is entered to output the probability that the current user is a specific user.

Preferably, the chat log information includes a plurality of chat logs, as shown in fig. 5, and the method includes:

firstly, step S501 is entered, the training model after training is completed calculates a score for each chat record, and the score corresponds to the probability that the current user is a specific user;

step S502 is then performed to calculate the probability that the current user is a specific user based on the scores and weights of all the chat records, wherein the weights of the chat records are determined according to the distance between the speaking time of the chat record and the current timestamp.

The method shown in fig. 4 and 5 will be further described below by taking a specific user (e.g., a user speaking a word with a offensiveness or a user speaking a plurality of sensitive words) discovery system as an example, where the suspicion calculation of the user is determined based on the sentences that the user uttered at a certain moment. Because the user speaking is often accompanied by a time stream, the system will generate a sequence of scores for each user based on this speaking time stream, and for a target user a reasonable suspicion should be a suspicion score based on the user's speaking pattern, hopefully a scalar is obtained to indicate the suspicion to which the user corresponds.

From a system perspective, the model scores each utterance record of the user in the time stream separately, resulting in a score sequence. From the perspective of the user of the system, the user expects a scalar score calculated from the speaking behavior exhibited by all or part of a user's speech over a period of time. Aiming at the unbalance of the data types of the two parties, the method for calculating the dynamic suspicion degree is designed, and the problem that the output of the system is not matched with the expectation of a user is solved.

For a specific user discovery system, the most essential goal of the system is to calculate the suspicion of a user through a historical speaking record of all or a period of time of the user, so as to discover the specific user. An ideal system determines whether a user is the most natural way for a particular user is: and obtaining a speaking mode of a user based on the speaking record of the user, and finally judging whether the user is a specific user based on the speaking mode.

Since scouting of specific user behavior is often real-time, it is critical for both a user utterance score sequence and a user-specific utterance time. A target specific user often has the following characteristics:

(1) There has been a recent speech activity associated with a particular user.

(2) In the speech recording, the section span of the proportion of the speech of the specific user is often large, and there is a possibility that the speech related to the specific user is mixed in many sentences of normal speech, or the speech related to the specific user is basically mixed in the speech.

While the characteristics of non-target users appear very consistent: never there is a speaking behavior associated with a particular user.

For this purpose, the application designs a dynamic suspicion calculation formula based on speaking time and score sequence, which mainly comprises any one or more of the following three methods:

(1) Weight decay based on relative talk time

Because of the real-time nature of the scout task, the speech of a suspected user often has different importance levels for the suspected person recognition task based on a relative time relationship. In general, the shorter term speech tends to be more relevant to whether the user is a suspect, while relatively long term recordings of speech appear as weak. This relationship tends to decrease exponentially based on relative talk time.

(2) Weight decay based on absolute talk time

Since the value expected to be calculated for the suspicion of the user is a scalar between 0 and 1, the algorithm accomplishes this by weighting and averaging the score branches based on the weight decay module relative to the time of the utterance. However, the behavior of weighted averaging is essentially: weight decay is performed based on the relative time relationship of the talk time of each sentence and the last talk time. This results in the algorithm not matching the characteristics (1) of the target specific user, and thus a weight decay module based on absolute talk time is introduced in the system, so that the calculated score will be decayed based on the last talk time. So that the algorithm matches the target user speaking behavior.

(3) Floor score sequence threshold sampling

The algorithm is based on a strong assumption: ensemble learning is highly capable of scoring one user's one-shot recordings of utterances, but is also a scoring of some non-specific user-related utterances that may be ambiguous but not very high.

Thus, in combination with the speech behavior characteristics of the strong hypothesis and the target and non-target users, the algorithm performs a threshold sampling process on the user score sequence. Specifically: setting a score sampling threshold (usually between 0.3 and 0.5), only keeping the score higher than the threshold in the score sequence, and then carrying out dynamic suspicion calculation on the score sequence obtained after the threshold sampling.

The use of this method is based on two considerations:

(1) From the characteristics (2) of the target users, the algorithm does not score a target user as well if it suspects that it is not large in the total utterance record. If the speech records with low suspicion scores (the behavioral speech records of non-specific users, which may be generated by both target users and non-target users) are screened out, the scores calculated by the algorithm are still accurate.

(2) For the target user, the fairness of the algorithm is affected if no threshold sampling is performed. For non-target users, the user cannot generate the speech of the specific user, the suspicion degree of each statement of speech of the user cannot reach a very high value based on the strong assumption of the algorithm, and the threshold value sampling is performed, so that the score result is not influenced greatly in theory.

The aim of the algorithm is to find out the target user with the suspected speech record recently through the score calculated dynamically, instead of calculating the proportion of the suspected speech of one user to the total speech, the process of carrying out the score sequence threshold sampling can enable the algorithm to be more matched with the speech behaviors of the target user, and meanwhile non-target users are better excluded.

Based on the principle of demand driving, the method designs the three modes to determine the weight of the chat record, so that the algorithm well matches the speaking behavior of the target user and well eliminates non-target users. By executing this algorithm, a scalar (range 0 to 1) is obtained that has a positive correlation in value with whether the user is a suspect. The problem that the type of the output data of the system is not matched with the type of the data expected by the user is solved.

The soft voting mechanism used in the method comprehensively considers the result of each base model (namely a first-level learner), comprehensively calculates the output probability of each base model, and finally converts the output probability into a category label. Considering the number of base models, the structure is not greatly different since the current base model ROBERTA, ERNIE, ELECTRA and BERT are both transducer-based structures. Combining multiple base models can provide benefits in several ways: firstly, from the aspect of statistics, because the spatial assumption of the learning task is often large, a plurality of assumptions achieve the same performance on the training set, if the number of used base models is less, the generalization performance is poor possibly caused by wrong selection, and the risk can be reduced by combining a plurality of enough base models; second, from the aspect of calculation, the learning algorithm tends to sink into the local minimum, and the generalization performance corresponding to some local minimum points may be very poor, and by combining after multiple runs, the risk of sinking into the poor local minimum points may be reduced; third, from the perspective of representation, the true assumptions of some learning tasks may not be in the assumption space considered by the current learning algorithm, where if a single learner is certainly ineffective, by combining multiple learners, a better approximation of the learning is possible due to the expansion of the corresponding assumption space.

Based on the above consideration, the number of the basic learners is set to be 3 or 4, meanwhile, the test accuracy of the four models is compared, the accuracy of each model is found to be similar, and more than one basic learner can better supplement text features learned by the other three basic learners, so that the number of the basic models of the project is determined to be 4, namely a ROBERTA model, an ERNIE model, an ELECTRA model and a BERT model.

In a third aspect, the present embodiment also provides a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the method of the first aspect.

Referring to fig. 11, in a fourth aspect, the present embodiment further provides an electronic device 1, including a memory 11 and a processor 12, where the memory 11 is configured to store one or more computer program instructions, and the one or more computer program instructions are executed by the processor 12 to implement the method described in the first aspect.

The electronic device 1 may be a tablet, a cell phone, a notebook, a desktop, etc., the storage medium/memory 11 including but not limited to: RAM, ROM, magnetic disk, magnetic tape, optical disk, flash memory, usb disk, removable hard disk, memory card, memory stick, web server storage, web cloud storage, etc. The processor 12 includes, but is not limited to, a CPU (central processing unit 12), a GPU (image processor 12), an MCU (microprocessor 12), and the like.

The technical scheme can fully aggregate various face feature information, effectively strengthen the expression capability of face features, and further improve the false face detection accuracy; by adding the airspace frequency domain feature fusion module, the airspace information of the image is fully utilized, and meanwhile, the multi-dimensional information aggregation is realized through the false trace which is focused by the frequency domain information and is generated due to the up-sampling operation in the generation process of the false face; the variant attention module can obtain the optimal characteristic representation through the enhanced attention operation, so that generalization of fake face detection and robustness of various unknown tampering methods in a real scene are greatly improved, and the resolving power of the model is improved, so that the model can keep a good effect in the real scene.

Finally, it should be noted that, although the embodiments have been described in the text and the drawings, the scope of the invention is not limited thereby. The technical scheme generated by replacing or modifying the equivalent structure or equivalent flow by utilizing the content recorded in the text and the drawings of the specification based on the essential idea of the invention, and the technical scheme of the embodiment directly or indirectly implemented in other related technical fields are included in the patent protection scope of the invention.

Claims

1. A model training method suitable for screening a specific user, said method comprising the steps of:

2. The model training method for screening specific users according to claim 1, wherein said calculating probabilities of all model outputs using a predetermined strategy, and outputting final calculation results comprises:

acquiring weight influence factors of each training model;

3. The model training method for screening specific users according to claim 1, wherein the training model further comprises a secondary learner, the chat sample data being divided into positive sample data and negative sample data according to user identification type;

4. The model training method for screening a specific user according to claim 3, wherein the primary learner is a BERT model, a ROBERTA model, an ERNIE model, and an electrora model, and the secondary learner is an MLP model.

5. The model training method for screening specific users according to claim 1, wherein said calculating probabilities of all model outputs using a predetermined strategy, and outputting final calculation results comprises:

6. The model training method for screening specific users according to claim 1, wherein said chat sample data is further processed according to the following manner and input into said training model:

7. A method for screening a particular user, the method comprising the steps of:

obtaining chat record information of a user to be tested, and inputting the chat record information into a training model after training, wherein the training model is trained according to the model training method as set forth in any one of claims 1 to 6;

and outputting the probability that the current user is a specific user.

8. The method for screening a particular user according to claim 7, wherein the chat log information comprises a plurality of chat logs, the method comprising:

9. A computer readable storage medium, on which computer program instructions are stored, which computer program instructions, when executed by a processor, implement the method of any of claims 1-8.

10. An electronic device comprising a memory and a processor, wherein the memory is configured to store one or more computer program instructions, wherein the one or more computer program instructions are executed by the processor to implement the method of any of claims 1-8.