CN113360617B

CN113360617B - Abnormality recognition method, apparatus, device, and storage medium

Info

Publication number: CN113360617B
Application number: CN202110633642.5A
Authority: CN
Inventors: 庞海龙; 岳江浩; 张玉东; 张文君; 张铮
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-06-07
Filing date: 2021-06-07
Publication date: 2023-08-04
Anticipated expiration: 2041-06-07
Also published as: CN113360617A

Abstract

The disclosure provides an anomaly identification method, an anomaly identification device, anomaly identification equipment and an anomaly identification storage medium, and relates to the technical field of artificial intelligence, in particular to intelligent search, machine learning and deep learning technologies. The specific implementation scheme is as follows: extracting text feature data in target question-answer data of a target object; determining behavior characteristic data according to behavior state data when the target object generates answer data of the target question-answer data; determining a target score according to the text feature data and the behavior feature data; and carrying out abnormal recognition on the target object according to the target score. According to the technology disclosed by the invention, the accuracy of the target object abnormal recognition result is improved.

Description

Abnormality recognition method, apparatus, device, and storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence, and more particularly to intelligent search, machine learning, and deep learning techniques.

Background

The knowledge question-and-answer community is an interactive, open community that provides knowledge needs and knowledge supplies to the public. The community forms exist in a form of 'question-answer' among users so as to realize knowledge sharing.

However, due to the openness of communities, part of users are promoted and guided by means of community resources, some cheating contents are provided, and sustainable development of communities is seriously affected.

Disclosure of Invention

The present disclosure provides an anomaly identification method, apparatus, device, and storage medium.

According to an aspect of the present disclosure, there is provided an anomaly identification method including:

extracting text feature data in target question-answer data of a target object;

determining behavior characteristic data according to behavior state data when the target object generates answer data of the target question-answer data;

determining a target score according to the text feature data and the behavior feature data;

and carrying out abnormal recognition on the target object according to the target score.

According to another aspect of the present disclosure, there is also provided an abnormality recognition apparatus including:

the text feature data extraction module is used for extracting text feature data in the target question-answer data of the target object;

the behavior characteristic data determining module is used for determining behavior characteristic data according to behavior state data when the target object generates answer data of the target question-answer data;

The target score determining module is used for determining a target score according to the text characteristic data and the behavior characteristic data;

and the abnormality identification module is used for carrying out abnormality identification on the target object according to the target score.

According to another aspect of the present disclosure, there is also provided an electronic apparatus including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform any one of the anomaly identification methods provided by the embodiments of the present disclosure.

According to another aspect of the present disclosure, there is also provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute any one of the anomaly identification methods provided by the embodiments of the present disclosure.

According to another aspect of the present disclosure, there is also provided a computer program product comprising a computer program which, when executed by a processor, implements any one of the anomaly identification methods provided by the embodiments of the present disclosure.

According to the technology disclosed by the invention, the accuracy of the target object abnormal recognition result is improved.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a flow chart of an anomaly identification method provided by an embodiment of the present disclosure;

FIG. 2 is a flow chart of another anomaly identification method provided by an embodiment of the present disclosure;

FIG. 3 is a flow chart of another anomaly identification method provided by an embodiment of the present disclosure;

FIG. 4 is a flow chart of another anomaly identification method provided by an embodiment of the present disclosure;

FIG. 5 is a block diagram of an anomaly identification method provided by an embodiment of the present disclosure;

fig. 6 is a block diagram of an abnormality recognition apparatus provided in an embodiment of the present disclosure;

fig. 7 is a block diagram of an electronic device for implementing the anomaly identification method of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The method and the device for identifying the abnormality of the different knowledge are suitable for scenes of identifying the abnormality of the question-answer condition of the target object in the knowledge question-answer community. The abnormality recognition method in the present disclosure may be performed by an abnormality recognition apparatus, which may be implemented in software and/or hardware and specifically configured in an electronic device. The electronic device may be a terminal device or a server.

For ease of understanding, the present disclosure first describes in detail various common sense methods.

Referring to fig. 1, an anomaly identification method includes:

s101, extracting text feature data in target question-answer data of a target object.

Wherein the target object can be understood as a user account identification uniquely characterizing the user. The question and answer data may include question data and/or answer data in a knowledge question and answer community. The target question-answer data includes answer data generated by the target object and/or question data corresponding to the answer data. It should be noted that, the content in the target question-answer data may exist in the form of, but not limited to, pictures and texts.

The text feature data is useful information extracted based on the image-text information of the target question-answer data, and is used as a reference basis for carrying out anomaly recognition on a target object when the target question-answer data is generated from a content dimension.

S102, determining behavior characteristic data according to behavior state data when the target object generates answer data of the target question-answer data.

The behavior state data are used for representing behavior attributes and state attributes of the target object when answer data of the target question-answer data are generated. The behavior attribute corresponds to the user behavior when generating answer data of the target question-answer data, and serves as a reference basis for abnormality recognition of the target object when generating the target question-answer data from the behavior dimension. The state attribute corresponds to a generation environment when answer data of the target question-answer data is generated, and serves as a reference basis for abnormality recognition of a target object when the target question-answer data is generated from an environment dimension.

S103, determining target scores according to the text feature data and the behavior feature data.

Optionally, the text feature data and the behavior feature data can be spliced and fused to obtain fusion data; and determining the target score according to the fusion data. Alternatively, text feature data and behavior feature data may be input in parallel, respectively, to determine the target score. Or alternatively, the text score can be determined according to the text feature data, and the behavior score can be determined according to the behavior feature data; and carrying out weighted average on the text score and the behavior score to obtain a target score. The weight of the weighted average can be set by the skilled person according to the requirement or experience value or can be determined through a large number of experiments.

In a specific implementation manner, the text feature data and the behavior feature data can be spliced and fused to obtain a fusion result; and inputting the fusion result into a first fusion evaluation model trained in advance, and outputting a target score. The first fusion evaluation model can be obtained by training in the following way: and training a pre-constructed first machine learning model by taking a spliced result of sample text feature data and sample behavior feature data extracted from sample question-answer data of a sample object as a training sample and taking abnormal labeling data of the sample object when the sample object generates the sample question-answer data as a label.

In another specific implementation manner, the text feature data and the behavior feature data can be input into a second fusion evaluation model with pre-trained values in parallel, and the target score is output. The second fusion evaluation model can be obtained by training in the following way: and training a second machine learning model which is built in advance by taking sample text characteristic data and sample behavior characteristic data extracted from sample question-answer data of a sample object as training sample pairs and taking abnormal labeling data of the sample object when the sample object generates the sample question-answer data as a label.

In yet another specific implementation, the text feature data may be input into a pre-trained text evaluation model to obtain a text score; inputting behavior characteristic data into a pre-trained behavior evaluation model to obtain a behavior score; and carrying out weighted average on the text score and the behavior score to obtain a target score. The text evaluation model and the behavior evaluation model can be obtained through independent training respectively: taking sample text feature data extracted from sample question-answer data of a sample object as a training sample, taking abnormal labeling data of the sample object when the sample object generates the sample question-answer data as a label, and training a third machine learning model constructed in advance to obtain a text evaluation model; and training a fourth machine learning model which is built in advance by taking sample behavior data of the sample object when the sample object generates sample question-answer data as a training sample and taking abnormal labeling data of the sample object when the sample object generates the sample question-answer data as a label to obtain a behavior evaluation model. Of course, the text evaluation model and the behavior evaluation model can be obtained by training at the same time: respectively taking sample text characteristic data and sample behavior characteristic data extracted from sample question-answer data of a sample object as input data of a first sub-network and a second sub-network in a third machine learning model, taking abnormal labeling data of the sample object when the sample object generates the sample question-answer data as a label, and training a fusion training model; and taking the trained first sub-network as a text evaluation model, and taking the trained second sub-network as a behavior evaluation model.

It should be noted that, the specific structures of the first machine learning model, the second machine learning model and the third machine learning model are not limited, and may be implemented based on at least one model in the prior art, and only the trained model needs to be ensured to have the corresponding functions.

S104, carrying out anomaly identification on the target object according to the target score.

Illustratively, determining whether the target score satisfies an abnormal condition; if yes, determining that the target object is abnormal when answer data of the target question-answer data are generated; otherwise, determining that the target object is normal when generating answer data of the target question-answer data.

In a specific implementation manner, if the target score is smaller than the abnormal score threshold, determining that the target object is abnormal when answer data of the target question-answer data is generated; if the target score is not smaller than the abnormal score threshold, determining that the target object is normal when generating answer data of the target question-answer data. Wherein the abnormality score threshold may be set by a technician as desired or as an empirical value, or may be determined repeatedly through a number of trials.

It should be noted that, in the technical solution of the present disclosure, the target object, the target question-answer data, the behavior state data, and the like, which are related, all conform to the rules of the related laws and regulations, and do not violate the popular regulations.

When the abnormality detection is carried out on the target object, the text feature data and the behavior feature data of the target question-answer data are introduced to carry out target score determination, so that the information of different dimensions can be comprehensively considered in the target score determination process, the accuracy of a target score determination result is improved, and the accuracy of an abnormality recognition result is improved when the abnormality recognition of the target object is carried out according to the target score.

On the basis of the technical schemes, the present disclosure also provides another alternative embodiment. In this embodiment, the extraction operation of text feature data is optimized and improved.

Referring to fig. 2, an anomaly identification method includes:

s201, carrying out abnormal data identification on the target question-answer data under a preset dimension to obtain data abnormal probability.

Wherein, the abnormal data can include but is not limited to illicit graphics, garbage graphics, administrative graphics, abuse graphics, advertisement graphics and the like. The forbidden graphics and texts can comprise graphics and texts carrying yellow-related and riot-related information, the garbage graphics and texts can comprise graphics and texts carrying fraud-related information, and the administrative-related graphics and texts can comprise graphics and texts carrying administrative-related or reaction-related information and the like.

The preset dimension may include at least one dimension of contraband, garbage, administration, abuse, advertisement, and the like.

In an alternative embodiment, the data anomaly recognition model may be trained in advance, and the target question-answer data may be input into the trained data anomaly recognition model to obtain the data anomaly probability. The data anomaly identification model can be obtained by training a machine learning model built in advance by taking sample question-answer data of a sample object as a training sample and taking data anomaly annotation data of the sample question-answer data under a preset dimension as a tag value. The specific structure of the machine learning model is not limited, and the abnormal data identification function can be realized only by ensuring.

It should be noted that the data anomaly recognition model may be one, and after the target question-answer data is input into the data anomaly recognition model, the data anomaly probability under each preset dimension is output.

However, the data anomaly probability determination under different preset dimensions is performed through an anomaly recognition model, the accuracy of the recognition result is poor, and the accuracy of the text feature data is further affected. In order to improve the accuracy of text feature data and further lay a foundation for improving the accuracy of anomaly identification, in an alternative embodiment, corresponding anomaly identification models can be trained for each preset dimension respectively, so that target question-answer data are input into the anomaly identification models corresponding to the preset dimensions, and the data anomaly probabilities of the corresponding preset dimensions are obtained.

S202, carrying out character statistics on the target question-answer data to obtain character statistics data.

The character statistical data is used for representing the content richness of the target question-answer data. By way of example, the character statistics may include text statistics such as text length, number of words, and word duty cycle; the character statistics may also include punctuation statistics such as punctuation category, number of punctuation, and punctuation duty cycle.

Alternatively, the text may include question text; accordingly, the text statistics may include at least one of a question text length, a number of question words, and a length ratio or a number ratio of the question words in the answer words.

Alternatively, the text may include answer text; accordingly, the text statistics may include at least one of an answer text length, an answer text number, and a length ratio or a number ratio of the question text in the answer text.

Optionally, if the text includes a question text and an answer text; accordingly, the text statistics may also include at least one of: the length ratio of the question text in the answer text, the number of the same characters of the question text in the answer text, the number ratio of the question punctuation in the answer punctuation, and the like.

S203, generating text characteristic data according to at least one of the data anomaly probability and the character statistical data.

And selecting at least one splice fusion from the data anomaly probability and the character statistical data corresponding to each preset dimension to obtain text characteristic data.

The selection of the data anomaly probability and the character statistics may be set by a technician according to the needs or experience values, or may be repeatedly determined through a large number of experiments.

In order to avoid the difference between the character statistics and the data anomaly probabilities due to the dimension effect, the character statistics may also be normalized prior to generating the text feature data. Correspondingly, text feature data is generated according to at least one of the data anomaly probability and the normalized character statistics.

In order to improve the richness of the content carried by the text feature data and further improve the comprehensiveness of the text feature data, in an alternative embodiment, the data anomaly probabilities and the character statistics data corresponding to the preset dimensions can be spliced and fused according to a set sequence to obtain the text feature data. The setting sequence may be set uniformly in advance by a technician.

S204, determining behavior characteristic data according to behavior state data when the target object generates answer data of the target question-answer data.

S205, determining the target score according to the text characteristic data and the behavior characteristic data.

S206, carrying out anomaly identification on the target object according to the target score.

According to the embodiment of the disclosure, the generation process of the text characteristic data is thinned to be abnormal data identification on the target question-answer data under the preset dimension, so that the data abnormal probability is obtained; performing character statistics on the target question-answer data to obtain character statistics data; and generating text characteristic data according to at least one of the data anomaly probability and the character statistical data, so that the generation mode of the text characteristic data is perfected. Meanwhile, the text characteristic data is generated through the abnormal data probability and the character statistical data of different preset dimensions, so that the diversity and the comprehensiveness of the text characteristic data are enriched, the guarantee is provided for the improvement of the accuracy of the target score determination result, and the foundation is laid for the improvement of the accuracy of the abnormal recognition result of the target object.

Based on the technical schemes, the present disclosure also provides an alternative embodiment. In this embodiment, the determination of the behavior feature data is optimized.

Referring to fig. 3, an anomaly identification method includes:

s301, extracting text feature data in target question-answer data of a target object.

S302, determining abnormal behavior probability according to the interactive behavior data when the target object generates answer data of the target question-answer data.

Wherein the interaction behavior data may include at least one of the following types of data: whether answer data in the target question-answer data is answered once, whether the target question-answer data generation page is entered by the setting entry page, the input speed of answer data of the target question-answer data, and the like. And setting the entry page as a page where an aggregation entry of the target question-answer data generation page is located.

For example, probability values when the interaction behavior data of each type correspond to different data values can be preset; and taking the statistical result of probability values of various interactive behavior data as abnormal behavior probability. The probability values corresponding to different data values can be determined by a technician according to the needs or experience values or repeatedly determined through a large number of experiments.

For example, if the target question-answer data is answered once, the probability of the corresponding abnormal range indicates that the possibility that the answer data is obtained by copying and pasting the target object is high; and if the target question-answering data is answered at least twice, the probability of the normal range is corresponding, and the probability value is lower when the number of answering times is larger, so that the possibility that the answer data is copied and pasted by the target object is low. For another example, if the target question-answer data generation page is entered by the set entry page, the probability of the corresponding normal range indicates that the target object is highly likely to enter the generation page normally; if the target question-answering data generation page is not the set entry page, the corresponding abnormal range probability indicates that the possibility that the target object enters the generation page abnormally is high. For another example, if the input speed of the target question-answer data is greater than the set speed threshold, the probability of the corresponding abnormal range indicates that the possibility that the answer data is obtained by copying and pasting the target object is high; if the input speed of the target question-answer data is not greater than the set speed threshold, the probability of the corresponding normal range indicates that the possibility of normal input of the answer data by the target object is higher.

In a specific implementation manner, when the target question-answering data is one-time answering, the corresponding probability is set to be 0.5, the corresponding probability is set to be 0.2 when the two-time answering is completed, and the corresponding probability is set to be 0 when the three-time answering is completed. The corresponding probability may be set to 0.05 when the target question-answer data generation page is entered from the set entry page, and set to 0.8 otherwise. When the input speed of the target question-answer data is greater than 180 words/min, setting the corresponding probability to be 0.8; setting the corresponding probability to be 0.6 when the number of words/min is more than 150 and not more than 180; setting the corresponding probability to be 0.5 when the number of the words is more than 120 words/min and not more than 150 words/min; when the probability is not more than 120 words/min, the corresponding probability is set to be 0.05.

In another optional embodiment, the pre-built behavior model can be trained by taking sample interaction behavior data of the sample object when the sample question-answer data is generated as a training sample and taking abnormal labeling data of the sample object when the sample object generates the sample question-answer data as a label, so that a trained behavior model is obtained. The behavior model can be implemented based on an existing machine learning model, and the model structure of the behavior model is not limited in the present disclosure.

Correspondingly, the interactive behavior data when the target object generates the answer data of the target question-answer data is input into the trained behavior probability model, and the behavior anomaly probability is obtained.

S303, determining the environment abnormality probability according to the interaction environment information when answer data of the target question-answer data are generated.

The interaction environment is used for representing surrounding environments with different dimensions when the target question-answer data is generated, and can comprise at least one of a device environment, a network environment and the like. The interactive environment information is used to characterize attribute values of the surrounding environment.

In an alternative embodiment, if the interaction environment comprises a device environment, the environment anomaly probability comprises a device anomaly probability; accordingly, the device abnormality probability can be determined from the input device information when answer data of the target question-answer data is generated.

For example, the input device information may include at least one data of usage system information when question-answer data is generated, a device on-off state, whether it belongs to a simulator, a device usage gesture, a device moving speed, and the like.

In a specific implementation manner, device information of a sample object when sample question-answer data are generated can be used as training data in advance, abnormal labeling data of the device information are used as labels, and a pre-built device safety judging model is trained. Correspondingly, equipment information when the target object generates answer data of the target question-answer data is input into the trained equipment safety discrimination model, and equipment abnormality probability is obtained. The equipment safety judging model can be realized based on the existing machine learning model, and the model structure of the equipment safety judging model is not limited in any way.

In another alternative embodiment, if the interaction environment comprises a network environment, the environment anomaly probability comprises a network anomaly probability; accordingly, the network anomaly probability can be determined according to the network environment information when the answer data of the target question-answer data is generated.

The network information may include at least one of network category, network risk level, number of unsupervised network clusters, frequency of occurrence of network in a set period of time, co-occurrence number of network when generating question data and network when generating answer data, and the like.

In a specific implementation manner, network information of a sample object when generating sample question-answer data can be used as training data in advance, abnormal labeling data of the network information is used as a label, and a pre-built network risk judging model is trained. Correspondingly, inputting the network information when the target object generates the answer data of the target question-answer data into the trained network risk judgment model to obtain the network anomaly probability. The network risk judging model can be implemented based on an existing machine learning model, and the model structure of the network risk judging model is not limited in any way.

It can be appreciated that according to the technical scheme, the environment anomaly probability is refined to be the set anomaly probability and/or the network anomaly probability, so that the diversity of the environment anomaly probability is enriched, the determination mechanism of different environment anomaly probabilities is perfected, a foundation is laid for the diversity and the comprehensiveness of behavior characteristic data, and further, data support is provided for the improvement of the accuracy of the target score determination result.

Of course, according to actual requirements, the embodiments of the present disclosure may further add other dimensions of interaction environment information, where the foregoing merely illustrates optional dimensions of the interaction environment information, and should not be construed as limiting the interaction environment.

S304, determining interaction activity according to historical interaction behavior data of the historical period of the target question-answer data association.

The association history period may be understood as a history period before the target question-answer data generation time, and may be, for example, a neighboring history period. The time length of the association history period is not limited, and may be set by a skilled person according to a need or an empirical value, or may be repeatedly determined through a large number of experiments.

The historical interaction behavior data are used for representing corresponding activity conditions of the target object in the association history period. The historical interaction behavior data may include at least one of the number of accesses of the target object to the knowledge question-answer community homepage, the access duration, the duration of accesses of related question data related to the target question-answer data, the duration of accesses of corresponding answer data of the related question data, the accumulated question number of the target object, the stay duration of the answer control, and the like in the historical period. The answer control can be understood as a control which needs to be triggered to enter an answer editing page or submit edited answer data when the answer data of the target object to generate target question-answer data is performed.

In a specific implementation manner, historical behavior data of an association historical period of sample question-answer data generated by a sample object can be used as training data in advance, and a pre-constructed liveness prediction model can be trained. Correspondingly, historical interaction behavior data of the target question-answer data in the associated historical period are input into a trained liveness prediction model, and interaction liveness is obtained. The liveness prediction model may be implemented based on an existing machine learning model, and the model structure of the liveness prediction model is not limited in any way.

S305, generating behavior characteristic data according to at least one of the behavior anomaly probability, the environment anomaly probability, the interaction liveness and the basic attribute data of the target object.

Wherein the base attribute data of the target object may include account attributes of the target object. The account attributes may include at least one of a lead-in account, an active account, a protected account, a history blocked account, and the like.

In an alternative embodiment, the base attribute data may be encoded to convert it to data within the [0,1] interval.

For example, at least one of the behavior anomaly probability, the environment anomaly probability, the interaction activity probability and the basic attribute data may be spliced and fused according to a set sequence to obtain behavior feature data. The setting sequence may be set uniformly in advance by a technician.

S306, determining the target score according to the text feature data and the behavior feature data.

S307, carrying out anomaly identification on the target object according to the target score.

It should be noted that, in the technical solution of the present disclosure, the related interactive behavior data, the interactive environment information, the historical interactive behavior data, the basic attribute data of the target object, and the like all conform to the requirements of the related laws and regulations, and do not violate the popular regulations.

According to the embodiment of the disclosure, the generation process of the behavior characteristic data is refined into interactive behavior data when answer data of target question-answer data are generated according to target objects, and abnormal behavior probability is determined; determining the environment abnormality probability according to the interactive environment information when answer data of the target question-answer data are generated; according to the historical interaction behavior data of the historical period of the correlation of the target question-answer data, determining interaction liveness; and generating behavior characteristic data according to at least one of the behavior anomaly probability, the environment anomaly probability, the interaction activity and the basic attribute data of the target object, so that the generation mode of the behavior characteristic data is perfected. Meanwhile, the behavior characteristic data are generated through different types of data, so that the diversity and the comprehensiveness of the behavior characteristic data are enriched, the guarantee is provided for the improvement of the accuracy of the target score determination result, and the foundation is laid for the improvement of the accuracy of the abnormal recognition result of the target object.

Based on the technical schemes, the present disclosure also provides an alternative embodiment. In this embodiment, the anomaly identification process of the target object will be optimized.

Referring to fig. 4, an anomaly identification method includes:

s401, extracting text feature data in target question-answer data of a target object.

S402, determining behavior characteristic data according to behavior state data when the target object generates answer data of the target question-answer data.

S403, determining the target score according to the text feature data and the behavior feature data.

S404, updating the target score according to the historical target score of the historical question-answer data of the target object.

S405, carrying out anomaly identification on the target object according to the updated target score.

It should be noted that, since whether the target object is abnormal when generating answer data of the target question-answer data is related to an abnormal situation in which the target object generates the history question-answer data, that is, if the target object is abnormal when generating the history question-answer data, the possibility of abnormality when generating answer data of the target question-answer data is greater. In order to improve the accuracy of the abnormal detection result when the target object generates the answer data of the target question-answer data, in this embodiment, the historical target score of the historical question-answer data of the target object may be introduced, and the target score when the answer data of the target question-answer data is generated may be updated in an optimized manner.

Wherein, the historical question-answer data may be at least one question-answer data generated before the answer data of the target question-answer data is generated. The present disclosure does not set any limit to the number of historical question-answer data and the specific generation timing.

In an alternative embodiment, updating the target score according to the historical target score of the historical question-answer data of the target object may be: determining a weighted average of historical target scores of the historical question-answer data of the target object and target scores of the target question-answer data; and taking the weighted average result as the updated target score. Wherein each historical target score and the weight value of the target score may be determined by a technician as needed or as an empirical value, or repeatedly by a number of experiments.

In a specific implementation manner, historical target scores of a set number of historical question-answer data adjacent to the target question-answer data generation time can be selected; determining the weight of a historical target score of each historical question-answer data according to the generation time interval of each historical question-answer data and the target question-answer data; determining the weight of the target score according to the weight of each historical target score; and according to each weight value, carrying out weighted summation on the historical target score and the target score to obtain the updated target score. Wherein, the smaller the generation time interval with the target question-answer data, the larger the weight. The set number can be set by a technician according to needs or experience values, or can be repeatedly determined through a large number of experiments. Wherein the sum of the weights of each historical target score and the target score is 1.

Since the question-answer data generating behavior of the target object in the knowledge question-answer community is not constant, the question-answer data generating behavior of the target object meets a certain attenuation characteristic, namely, the question-answer data generating frequency of the target object in the knowledge question-answer community gradually decreases with the passage of time until the frequency of the question-answer data generating behavior of the target object in the knowledge question-answer community is stable. In view of this, in another alternative embodiment of the present disclosure, the target score may also be updated based on: determining a behavior attenuation factor of the target object according to the historical target score of the historical question-answer data of the target object; and updating the target score according to the behavior attenuation factor.

It can be understood that the target score is updated by introducing the behavior attenuation factor, so that the updated target score better accords with the behavior activity rule of the target object, the accuracy of the finally determined target score is improved, and a foundation is laid for improving the accuracy of the abnormal recognition result.

For example, the historical target scores of the historical question-answer data at different moments can be subjected to curve fitting, and each coefficient in the curve fitting result is used as a behavior attenuation factor; correspondingly, determining a reference score of the generation moment of the target question-answer data according to the fitted curve; and taking the weighted average of the reference score and the target score as the updated target score. The weights of the reference score and the target score can be determined by a technician according to needs or experience values or repeatedly determined through a large number of experiments.

For example, the following formula may be used to determine the behavior decay factor of the target object:

score _n ＝score _n-1 *exp(-a*(t _n -t _n-1 ))；

wherein a is a behavior decay factor, score _n And score _n-1 Historical time t sequentially adjacent to generation time of answer data of target question-answer data _n And t _n-1 Historical target scores corresponding to the historical question-answer data;

correspondingly, the target score is updated by adopting the following formula:

last_uscore＝uscore*exp(-a*(t-t _n ))；

the uscore is a target score, the last_uscore is an updated target score, and t is the generation time of answer data of the target question-answer data.

It should be noted that, when the target object does not generate the historical question-answer data before generating the answer data of the target question-answer data, or the generated historical question-answer data is insufficient, there may be a case that the behavior decay factor cannot be determined, and at this time, the target score may not need to be updated, and the target score determined according to the text feature data and the behavior feature data may be directly adopted to perform the abnormal behavior on the target object.

It should be noted that, in the technical solution of the present disclosure, the obtaining, storing, and applying of the historical target score of the historical question-answer data of the related target object all conform to the rules of the related laws and regulations, and do not violate the popular regulations of the public order.

According to the method and the device for determining the target score, the historical target score of the historical question-answer data of the target object is introduced in the process of carrying out anomaly identification on the target object, and the target score is updated, so that the historical behavior condition of the target object can be referred to in the process of determining the target score instead of only according to the single generation moment of the target question-answer data, the richness and the comprehensiveness of the reference data in the process of determining the target score are improved, the accuracy of the target score determination result is improved, and a foundation is laid for improving the accuracy of the anomaly identification result of the target object at the generation moment of the target question-answer data.

On the basis of the technical schemes, the disclosure also provides a preferred embodiment of an abnormality identification method.

Referring to a flow chart of an anomaly identification method shown in fig. 5, when answer data of target question-answer data is generated for a target account, anomaly account identification is performed for the target account. The abnormal account identification process is realized based on a text evaluation model, a behavior evaluation model, a fusion evaluation model, a target score updating module and an abnormal identification module.

1) Text evaluation model

And inputting the target question-answer data of the target account into a text evaluation model to obtain text characteristic data.

Different data anomaly recognition sub-models are arranged in the text evaluation model aiming at different preset dimensions and are used for determining the data anomaly probabilities of the corresponding dimensions. The preset dimension comprises at least one of forbidden pictures, forbidden texts, junk pictures, junk texts, administrative-related pictures, administrative-related texts, abuse pictures, abuse texts, advertisement pictures, advertisement texts and the like. Each preset dimension corresponds to an independently trained sub-model, each sub-model is realized based on an existing machine learning model, and the model structure of each machine learning model is not limited in the present disclosure.

The text evaluation model also comprises a statistical strategy module which is used for counting the length ratio, the word quantity ratio and the punctuation quantity ratio of the question words in the answer words in the target question-answer data, and carrying out normalization processing on the data of each ratio.

The text evaluation model also comprises a fusion module which is used for splicing and fusing the abnormal probability of the data output by each sub-model and the normalized duty ratio data output by the statistic strategy module according to a set sequence to obtain text characteristic data.

The target question-answer data of the target account refers to question-answer data corresponding to answer data generated by the target account.

2) Behavior evaluation model

And inputting behavior state data of the target account when answer data are generated into a behavior evaluation model to obtain behavior characteristic data.

Different functional sub-models are arranged in the behavior evaluation model and are used for generating characteristic data of different dimensions in the behavior characteristic data.

For example, a behavior model may be included in the behavior evaluation model to determine a behavior anomaly probability of the target account. Correspondingly, the interactive behavior data in the behavior state data are input into the trained behavior model, and the behavior anomaly probability is obtained. The behavior model can be implemented based on the existing machine learning model, and the model structure of each machine learning model is not limited in the present disclosure. The interactive behavior data may include whether answer data in the target question-answer data is answered once, whether the target question-answer data generation page is entered by a setting entry page, and an input speed of answer data of the target question-answer data, etc.

For example, the behavior evaluation model may include a device security discrimination model for determining a device anomaly probability when the target account generates answer data. Correspondingly, input equipment information when the target account generates answer data is input into a trained equipment safety discrimination model, and equipment anomaly probability is obtained. The device safety judging model can be realized based on the existing machine learning model, and the model structure of each machine learning model is not limited in the present disclosure. The input device information may include usage system information when question-answer data is generated, a device on-off state, whether it belongs to a simulator, a device usage posture, a device movement speed, and the like.

For example, a network risk determination model may be included in the behavioral assessment model to determine the probability of network anomalies when the target account generates answer data. Correspondingly, inputting the network information when the target account generates the answer data into the trained network risk judgment model to obtain the network anomaly probability. The network risk judging model can be implemented based on the existing machine learning model, and the model structure of each machine learning model is not limited in the present disclosure. The network information may include a network category, a network risk level, an unsupervised network cluster number, a frequency of occurrence of a network in a set period of time, a co-occurrence number of a network when generating question data and a network when generating answer data, and the like.

For example, an activity prediction model may be included in the behavioral assessment model to predict the activity of the target account during adjacent historical periods prior to generating answer data. Correspondingly, historical interaction behavior data in adjacent historical time periods before answer data are generated by the target account is input into a trained liveness prediction model, and interaction liveness is obtained. The liveness prediction model may be implemented based on an existing machine learning model, and the model structure of each machine learning model is not limited in this disclosure. The historical interaction behavior data may include at least one of the number of accesses to the knowledge question-answer community homepage by the target account, the access duration, the duration of accesses to the related question data of the target question-answer data, the duration of accesses to the corresponding answer data of the related question data, the accumulated question number of the target account, the stay duration of the answer control, and the like in the adjacent historical period. The answer control can be understood as a control which needs to be triggered to enter an answer editing page or submit edited answer data when the answer data of the target object to generate target question-answer data is performed.

For example, a post-policy module may be included in the behavioral assessment model for encoding account attributes of the target account. The account attributes comprise an imported account, an active account, a protection account, a history blocking account and the like.

The behavior evaluation model comprises a fusion module, wherein the fusion module is used for splicing and fusing at least one of behavior anomaly probability, equipment anomaly probability, network anomaly probability, interaction liveness and account attribute coding data according to a set sequence to obtain behavior characteristic data.

3) Fusion evaluation model

And inputting the target feature data and the behavior feature data into the fusion evaluation model to obtain a target score when the target account generates answer data. The fusion evaluation model is realized based on the existing machine learning model, and the model structure of each machine learning model is not limited in the present disclosure.

4) Target score updating module

And the target score updating module is used for updating the target score generated by the preamble.

Illustratively, identifying whether there are historical target scores for historical question-answer data preceding at least two target question-answer data in a score list for the target account; if yes, determining a behavior attenuation factor according to two adjacent historical target scores at the generation moment, and updating the target score according to the behavior attenuation factor; otherwise, the target score is not processed.

score _n ＝score _n-1 *exp(-a*(t _n -t _n-1 ))；

correspondingly, the target score is updated by adopting the following formula:

last_uscore＝uscore*exp(-a*(t-t _n ))；

For example, the target score may also be added to a score list of the target account for subsequent use.

5) Abnormality recognition module

The anomaly identification model is used for carrying out account anomaly identification on the target account when answer data in the target question-answer data are generated.

Illustratively, the target score output by the target score updating module is compared with a preset score threshold; if the target score output by the target score updating module is smaller than a preset score threshold, determining that the target account is abnormal at the moment of generating answer data of the target question-answer data; otherwise, determining that the target account is normal at the answer data generation time of the target question-answer data.

As an implementation of the various common sense methods described above, the present disclosure also provides an alternative embodiment of a virtual device that implements the anomaly identification method.

Referring to an abnormality recognition apparatus 600 shown in fig. 6, comprising: a text feature data extraction module 601, a behavioral feature data determination module 602, a goal score determination module 603, and an anomaly identification module 604. Wherein, the liquid crystal display device comprises a liquid crystal display device,

a text feature data extraction module 601, configured to extract text feature data in target question-answer data of a target object;

a behavior feature data determining module 602, configured to determine behavior feature data according to behavior state data when the target object generates answer data of the target question-answer data;

a target score determining module 603, configured to determine a target score according to the text feature data and the behavior feature data;

and the anomaly identification module 604 is configured to identify anomalies of the target object according to the target score.

In an alternative embodiment, the text feature data extraction module 601 includes:

the data anomaly probability obtaining unit is used for carrying out anomaly data identification on the target question-answer data under a preset dimension to obtain data anomaly probability;

the character statistics data obtaining unit is used for carrying out character statistics on the target question-answer data to obtain character statistics data;

and the text characteristic data generating unit is used for generating the text characteristic data according to at least one of the data abnormality probability and the character statistical data.

In an alternative embodiment, the behavioral characteristic data determination module 602 includes:

the behavior abnormality probability determining unit is used for determining the behavior abnormality probability according to the interactive behavior data when the target object generates the answer data of the target question-answer data;

the environment anomaly probability determining unit is used for determining environment anomaly probability according to interaction environment information when answer data of the target question-answer data are generated;

the interaction activity determining unit is used for determining interaction activity according to the historical interaction behavior data of the target question-answer data correlation historical period;

and the behavior characteristic data generation unit is used for generating the behavior characteristic data according to at least one of the behavior abnormality probability, the environment abnormality probability, the interaction liveness and the basic attribute data of the target object.

In an alternative embodiment, the environmental anomaly probability comprises a device anomaly probability; the environment anomaly probability determination unit includes:

a device abnormality probability determination subunit, configured to determine the device abnormality probability according to input device information when answer data of the target question-answer data is generated; and/or the number of the groups of groups,

the environment anomaly probability comprises a network anomaly probability; the environment anomaly probability determination unit includes:

and the network anomaly probability determining subunit is used for determining the network anomaly probability according to the network environment information when the answer data of the target question-answer data is generated.

In an alternative embodiment, the anomaly identification module 604 includes:

a target score updating unit, configured to update a target score according to a historical target score of historical question-answer data of the target object;

and the abnormality identification unit is used for carrying out abnormality identification on the target object according to the updated target score.

In an alternative embodiment, the target score updating unit includes:

a behavior attenuation factor determining subunit, configured to determine a behavior attenuation factor of the target object according to a historical target score of the historical question-answer data of the target object;

And the target score updating subunit is used for updating the target score according to the behavior attenuation factor.

In an alternative embodiment, the behavior decay factor determination subunit comprises:

a behavior attenuation factor determining slave unit, configured to determine a behavior attenuation factor of the target object using the following formula:

score _n ＝score _n-1 *exp(-a*(t _n -t _n-1 ))；

wherein a is the behavior decay factor, score _n And score _n-1 Historical time t sequentially adjacent to the generation time of answer data of the target question-answer data respectively _n And t _n-1 Historical target scores corresponding to the historical question-answer data;

the target score updating subunit includes:

the target score updating slave unit is used for updating the target score by adopting the following formula:

last_uscore＝uscore*exp(-a*(t-t _n ))；

The abnormality recognition device can execute the abnormality recognition method provided by any embodiment of the disclosure, and has the corresponding functional module and beneficial effects of executing the abnormality recognition method.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 7 illustrates a schematic block diagram of an example electronic device 700 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 7, the apparatus 700 includes a computing unit 701 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the device 700 may also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Various components in device 700 are connected to I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, etc.; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, an optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The calculation unit 701 performs the respective methods and processes described above, for example, an abnormality recognition method. For example, in some embodiments, the anomaly identification method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 700 via ROM 702 and/or communication unit 709. When the computer program is loaded into the RAM 703 and executed by the computing unit 701, one or more steps of the abnormality recognition method described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the anomaly identification method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.

Artificial intelligence is the discipline of studying the process of making a computer mimic certain mental processes and intelligent behaviors (e.g., learning, reasoning, thinking, planning, etc.) of a person, both hardware-level and software-level techniques. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligent software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, a machine learning/deep learning technology, a big data processing technology, a knowledge graph technology and the like.

Cloud computing (cloud computing) refers to a technical system that a shared physical or virtual resource pool which is elastically extensible is accessed through a network, resources can comprise servers, operating systems, networks, software, applications, storage devices and the like, and resources can be deployed and managed in an on-demand and self-service mode. Through cloud computing technology, high-efficiency and powerful data processing capability can be provided for technical application such as artificial intelligence and blockchain, and model training.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions provided by the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. An anomaly identification method comprising:

extracting text feature data in target question-answer data of a target object;

determining a behavior decay factor of the target object using the formula:

score _n = score _n-1 * exp( -a * ( t _n - t _n-1 ))；

updating the target score according to the behavior attenuation factor;

and carrying out abnormal recognition on the target object according to the updated target score.

2. The method of claim 1, wherein the extracting text feature data in the target question-answer data of the target object comprises:

under a preset dimension, carrying out abnormal data identification on the target question-answer data to obtain data abnormal probability;

performing character statistics on the target question-answer data to obtain character statistics data;

and generating the text characteristic data according to at least one of the data anomaly probability and the character statistics.

3. The method of claim 1, wherein the determining behavior feature data from behavior state data when the target object generates answer data of the target question-answer data includes:

according to the interactive behavior data when the target object generates answer data of the target question-answer data, determining abnormal behavior probability;

determining the environment abnormality probability according to the interactive environment information when the answer data of the target question-answer data is generated;

according to the historical interaction behavior data of the target question-answer data association historical period, determining interaction activity;

and generating the behavior characteristic data according to at least one of the behavior abnormality probability, the environment abnormality probability, the interaction liveness and the basic attribute data of the target object.

4. A method according to claim 3, wherein the environmental anomaly probability comprises a device anomaly probability; the determining the environment abnormality probability according to the interaction environment information when generating the answer data of the target question-answer data comprises the following steps:

determining the equipment abnormality probability according to the input equipment information when answer data of the target question-answer data are generated; and/or the number of the groups of groups,

the environment anomaly probability comprises a network anomaly probability; the determining the environment abnormality probability according to the interaction environment information when generating the answer data of the target question-answer data comprises the following steps:

and determining the network anomaly probability according to the network environment information when the answer data of the target question-answer data is generated.

5. The method of any of claims 1-4, wherein the updating the goal score according to the behavior decay factor comprises:

updating the target score using the following formula:

last_uscore = uscore * exp( -a * ( t - t _n ))；

6. An abnormality recognition device comprising:

the anomaly identification module is used for carrying out anomaly identification on the target object according to the target score;

wherein, the abnormality recognition module includes:

the abnormality identification unit is used for carrying out abnormality identification on the target object according to the updated target score;

wherein the target score updating unit includes:

a target score updating subunit, configured to update the target score according to the behavior attenuation factor;

wherein the behavior decay factor determination subunit comprises:

score _n = score _n-1 * exp( -a * ( t _n - t _n-1 ))；

wherein a is the behavior decay factor, score _n And score _n-1 Historical time t sequentially adjacent to the generation time of answer data of the target question-answer data respectively _n And t _n-1 Historical objective scores corresponding to historical question-answer data.

7. The apparatus of claim 6, wherein the text feature data extraction module comprises:

8. The apparatus of claim 6, wherein the behavioral characteristic data determination module comprises:

9. The apparatus of claim 8, wherein the environmental anomaly probability comprises a device anomaly probability; the environment anomaly probability determination unit includes:

10. The apparatus of any of claims 6-9, wherein the target score updating subunit comprises:

last_uscore = uscore * exp( -a * ( t - t _n ))；

11. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform an anomaly identification method as claimed in any one of claims 1 to 5.

12. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform an anomaly identification method according to any one of claims 1-5.