CN112559724A - Method and system for preventing malicious search chat robot vulnerability - Google Patents

Method and system for preventing malicious search chat robot vulnerability Download PDF

Info

Publication number
CN112559724A
CN112559724A CN202110000300.XA CN202110000300A CN112559724A CN 112559724 A CN112559724 A CN 112559724A CN 202110000300 A CN202110000300 A CN 202110000300A CN 112559724 A CN112559724 A CN 112559724A
Authority
CN
China
Prior art keywords
class
user
features
conversation
monitoring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110000300.XA
Other languages
Chinese (zh)
Other versions
CN112559724B (en
Inventor
路林林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Suoxinda Data Technology Co ltd
Original Assignee
Shenzhen Suoxinda Data Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Suoxinda Data Technology Co ltd filed Critical Shenzhen Suoxinda Data Technology Co ltd
Priority to CN202110000300.XA priority Critical patent/CN112559724B/en
Publication of CN112559724A publication Critical patent/CN112559724A/en
Application granted granted Critical
Publication of CN112559724B publication Critical patent/CN112559724B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a method and a system for preventing a vulnerability of a malicious search chat robot, wherein the method comprises the following steps: receiving a dialogue request of a user; extracting personal information of the user from the conversation request, and storing the personal information into a user identity database; monitoring the chatting process in real time; monitoring whether a specific event occurs in a certain number of conversation times; and adopting a corresponding conversation strategy for the user based on the monitoring result. The invention can identify the client who maliciously collects the robot bugs and adopt the corresponding conversation strategy in time.

Description

Method and system for preventing malicious search chat robot vulnerability
Technical Field
The invention belongs to the field of computers, and particularly relates to a method and a system for preventing a vulnerability of a malicious search chat robot.
Background
Chat robots (chatbots) are a product that is widely used in the natural language processing field of the business industry. Many companies use chat robots to partially or even completely replace/assist in human customer service. In the design of the chat robot, some words are listed in a sensitive word list, once the robot has the sensitive words in a conversation with a client, the chat robot generally answers the question of the client automatically and avoids direct answer, such as 'no answer' or 'asking for switching to manual' and the like. Sensitive vocabularies are manually incorporated into the design of the conversation robot, and the sensitive vocabularies are numerous, difficult to cover completely and easy to leak.
Through long-time and many-round conversations with the robot, the chat robot has a plurality of obvious errors during the period, but does not find that the conversant searches the vulnerability of the chat robot maliciously, does not suspend the conversation, and is held by the conversant to chat.
Disclosure of Invention
Aiming at the defects in the prior art, the method and the system for preventing the malicious search of the chat robot bugs can identify the clients who maliciously search the robot bugs and adopt corresponding conversation strategies in time.
To this end, in a first aspect, the present invention provides a method for preventing a vulnerability of a malicious search chat robot, comprising the following steps:
receiving a dialogue request of a user;
extracting personal information of the user from the conversation request, and storing the personal information into a user identity database;
monitoring the chatting process in real time;
monitoring whether a specific event occurs in a certain number of conversation times;
and adopting a corresponding conversation strategy for the user based on the monitoring result.
Wherein the monitoring whether a specific event occurs in a certain number of dialog times includes:
counting the number of times of occurrence of a specific feature in a certain number of sessions.
Wherein the specific feature comprises seven category features.
Wherein the specific features specifically include:
a first class of features comprising: forward expressions of query opinions, favorite classes such as "you like", "you love", and "you think" number of occurrences;
a second class of features comprising: negative expressions of the query opinion, preference classes, such as "you hate", "you dislike", and "hate" the number of occurrences;
a third class of features comprising: sensitive words, star names and times of celebrities;
a fourth class of features comprising: sensitive words, star names, and maximum number of consecutive occurrences of celebrities;
a fifth class of features comprising: the cumulative number of times the chat robot did not find an appropriate reply (the chat robot replies that the client was manually consulted or directly indicates no understanding, etc.);
a sixth class of features comprising: (cumulative number of times the chat robot did not find an appropriate reply (chat robot replies that the client made a manual consultation or directly indicates no understanding, etc.))/duration of a single continuous conversation by the conversant;
a seventh class of features comprising: the number of sentences of the chatting person inquiry service is a percentage of the number of all chatting sentences.
Further, monitoring whether a particular event occurs for a certain number of sessions identifies a normal class and a malicious class by detecting an abnormal sample.
Further, wherein the identifying normal and malicious classes by detecting anomalous samples comprises:
assuming D represents the number of categories of a particular feature, formed by session records of n clients
Figure RE-918460DEST_PATH_IMAGE001
Sample data
Figure RE-441714DEST_PATH_IMAGE002
Figure RE-963962DEST_PATH_IMAGE001
The dimension of the features in the individual sample data is D, and the sample is recorded as
Figure RE-272584DEST_PATH_IMAGE003
Figure RE-222085DEST_PATH_IMAGE004
For each discrete feature of each sample, respectively
Figure RE-498215DEST_PATH_IMAGE005
Performing dithering, and recording the dithered data set as
Figure RE-355312DEST_PATH_IMAGE006
Step 1, optionally selecting in D
Figure RE-784020DEST_PATH_IMAGE007
Each feature, in common
Figure RE-904422DEST_PATH_IMAGE008
Seed combinations, the original data set is divided into m subsets, denoted as
Figure RE-153001DEST_PATH_IMAGE009
Wherein
Figure RE-266320DEST_PATH_IMAGE010
Step 2, in each subset
Figure RE-549533DEST_PATH_IMAGE009
In (1), each sample is calculated
Figure RE-44100DEST_PATH_IMAGE011
Is abnormal score of
Figure RE-29242DEST_PATH_IMAGE012
Step 3, calculating the total abnormal score of each sample
Figure RE-696984DEST_PATH_IMAGE013
Figure RE-834704DEST_PATH_IMAGE014
Wherein
Figure RE-296909DEST_PATH_IMAGE008
Figure RE-316818DEST_PATH_IMAGE015
Figure RE-240781DEST_PATH_IMAGE016
Results
Figure RE-233007DEST_PATH_IMAGE017
Step 4, setting a threshold value as
Figure RE-866114DEST_PATH_IMAGE018
Then the set of outliers is:
collection
Figure RE-825849DEST_PATH_IMAGE019
Wherein
Figure RE-835393DEST_PATH_IMAGE015
Figure RE-478864DEST_PATH_IMAGE020
Representation collection
Figure RE-548451DEST_PATH_IMAGE021
Is/are as follows
Figure RE-480635DEST_PATH_IMAGE022
And (5) dividing the site.
Further, wherein step 2 comprises:
(1) random decimation
Figure RE-277559DEST_PATH_IMAGE023
One of the d features of (1)
Figure RE-978798DEST_PATH_IMAGE024
Figure RE-219287DEST_PATH_IMAGE025
And randomly selecting a value of the feature as a boundaryDividing the data set
Figure RE-638767DEST_PATH_IMAGE009
Dividing into 2 classes, recording data number of each class, and storing in list
Figure RE-239381DEST_PATH_IMAGE026
(ii) a Then randomly selecting a feature, randomly selecting the value of the feature as a boundary line for segmentation, and dividing the boundary line
Figure RE-60707DEST_PATH_IMAGE009
Dividing into 4 data sets, recording the number of each type of data
Figure RE-472097DEST_PATH_IMAGE027
(ii) a Iterating until each data point is classified into a different class, wherein
Figure RE-378873DEST_PATH_IMAGE028
Is shown as
Figure RE-330648DEST_PATH_IMAGE029
A set of sample numbers in each class after the sub-division;
(2) from the start of the segmentation, if at
Figure RE-267466DEST_PATH_IMAGE028
In
Figure RE-849758DEST_PATH_IMAGE030
The number of occupied classes is 1, i.e.
Figure RE-243830DEST_PATH_IMAGE030
Independently of class 1, then
Figure RE-936979DEST_PATH_IMAGE030
In that
Figure RE-982165DEST_PATH_IMAGE031
Is divided into a total number of times
Figure RE-469778DEST_PATH_IMAGE029
Then, the division frequency of all samples is recorded as
Figure RE-85567DEST_PATH_IMAGE032
Wherein
Figure RE-582407DEST_PATH_IMAGE033
Then
Figure RE-482099DEST_PATH_IMAGE030
Is abnormal score of
Figure RE-406193DEST_PATH_IMAGE034
Further, the adopting, based on the monitoring result, a corresponding dialogue strategy for the user includes:
if the behavior is determined to be the behavior of searching the chat robot vulnerability, submitting the user information, and converting the user information into manual processing, wherein the user information is
Figure RE-306016DEST_PATH_IMAGE035
Corresponding user information.
Further, the personal information includes one or more of an IP address, a device code, a micro signal, or a QQ number of the user.
In a second aspect, the present invention further provides a system for preventing a vulnerability of a malicious search chat robot, which implements the method described above, and includes:
a request receiving module for receiving a dialogue request of a user;
the information extraction module is used for extracting personal information of the user from the conversation request and storing the personal information into a user identity database;
the real-time monitoring module is used for monitoring the chatting process in real time;
the event judging module is used for monitoring whether a specific event occurs in a certain number of conversation times;
and the strategy selection module adopts a corresponding conversation strategy for the user based on the monitoring result.
Compared with the prior art, the chat robot has the advantages that the chat robot is more intelligent, the client who maliciously collects the loopholes of the chat robot can be identified, the reply of the chat robot is stopped in time, public opinion hazards caused by multiple wrong answers of the chat robot are avoided, and companies or platforms using the chat robot are protected.
Drawings
The above and other objects, features and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar or corresponding parts and in which:
FIG. 1 is a flow chart illustrating a method for preventing vulnerability of a malicious search chat robot according to an embodiment of the present invention;
FIG. 2 is a flow diagram illustrating analysis of a data set according to an embodiment of the invention; and
fig. 3 is a block diagram illustrating a system for preventing a vulnerability of a malicious search chat robot according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, and "a plurality" typically includes at least two.
It should be understood that although the terms first, second, third, etc. may be used to describe … … in embodiments of the present invention, these … … should not be limited to these terms. These terms are used only to distinguish … …. For example, the first … … can also be referred to as the second … … and similarly the second … … can also be referred to as the first … … without departing from the scope of embodiments of the present invention.
Alternative embodiments of the present invention are described in detail below with reference to the accompanying drawings.
In the conversation design of the chat robot, a function of identifying a conversation person who maliciously collects the errors of the robot is added. The functional module marks out a malicious dialog person by extracting features from the dialog between the dialog person and the robot and adopting an unsupervised machine learning method, returns the information (ip address, equipment code and the like) of the dialog person to the module for generating the content reply, and after recognition, the subsequent reply of the chat robot to the malicious dialog person is set to be 'contact person' or other replies transferring the attention of the dialog person instead of generating the reply in the way of generating the chat content as it is.
The first embodiment,
Referring to fig. 1, the invention discloses a method for preventing a vulnerability of a malicious search chat robot, comprising the following steps:
receiving a dialogue request of a user;
extracting personal information of the user from the conversation request, and storing the personal information into a user identity database; the personal information includes one or more of an IP address, a device code, a micro-signal, or a QQ number of the user;
monitoring the chatting process in real time;
monitoring whether a specific event occurs in a certain number of conversation times;
and adopting a corresponding conversation strategy for the user based on the monitoring result.
Example II,
On the basis of the above embodiments, the embodiments of the present invention may include the following:
after the chat process is monitored in real time, the monitoring whether a specific event occurs in a certain number of dialog times includes:
counting the number of times of occurrence of a specific feature in a certain number of sessions.
In one application scenario, the features specific to the embodiments of the present invention include seven class features. Further, the specific feature occurrence times specifically include:
the first category features include: forward expressions of query opinions, favorite classes such as "you like", "you love", and "you think" number of occurrences;
the second category features include: negative expressions of the query opinion, preference classes, such as "you hate", "you dislike", and "hate" the number of occurrences;
the third category features include: the occurrence times of sensitive words, stars and celebrities in the conversation;
the fourth category features include: the maximum number of continuous occurrences of sensitive words, stars and celebrities in the conversation;
the fifth category features include: the cumulative number of times the chat robot did not find an appropriate reply (the chat robot replies that the client did not find a manual consultation or directly indicates no comprehension, etc.);
the sixth category features include: (frequency, i.e. cumulative number of times) that the chat robot does not find a suitable reply (chat robot replies that the client finds a manual consultation or directly indicates no comprehension, etc.)/duration of continuous conversation by the conversant;
the seventh category of features includes: the chat partner asks for the percentage of the business content to all chat content.
Further, monitoring whether a particular event occurs for a certain number of sessions identifies a normal class and a malicious class by detecting an abnormal sample.
Further, wherein the identifying normal and malicious classes by detecting anomalous samples comprises:
assuming D represents the number of categories of a particular feature, formed by session records of n clients
Figure RE-340968DEST_PATH_IMAGE036
Sample data
Figure RE-111478DEST_PATH_IMAGE037
Figure RE-455740DEST_PATH_IMAGE038
The dimension of the feature in the individual sample data is
Figure RE-780542DEST_PATH_IMAGE039
The sample is recorded as
Figure RE-619185DEST_PATH_IMAGE040
Figure RE-244202DEST_PATH_IMAGE041
For each discrete feature of each sample, respectively
Figure RE-306836DEST_PATH_IMAGE042
Performing dithering, and recording the dithered data set as
Figure RE-633781DEST_PATH_IMAGE043
Step 1, optionally selecting in D
Figure RE-10535DEST_PATH_IMAGE044
Each feature, in common
Figure RE-490058DEST_PATH_IMAGE045
Seed combinations, the original data set is divided into m subsets, denoted as
Figure RE-926856DEST_PATH_IMAGE046
Wherein
Figure RE-475518DEST_PATH_IMAGE047
Step 2, in each subset
Figure RE-655963DEST_PATH_IMAGE046
In (1), each sample is calculated
Figure RE-989993DEST_PATH_IMAGE048
Is abnormal score of
Figure RE-597691DEST_PATH_IMAGE049
Step 3, calculating the total abnormal score of each sample
Figure RE-446699DEST_PATH_IMAGE050
Figure RE-680103DEST_PATH_IMAGE051
Wherein
Figure RE-868639DEST_PATH_IMAGE045
Figure RE-647239DEST_PATH_IMAGE052
Figure RE-921225DEST_PATH_IMAGE053
Results
Figure RE-692741DEST_PATH_IMAGE054
Figure RE-1363DEST_PATH_IMAGE055
Step 4, setting a threshold value as
Figure RE-950864DEST_PATH_IMAGE056
Then the set of outliers is:
collection
Figure RE-774464DEST_PATH_IMAGE057
Wherein
Figure RE-834824DEST_PATH_IMAGE058
Figure RE-247219DEST_PATH_IMAGE059
Representation collection
Figure RE-633201DEST_PATH_IMAGE021
Is/are as follows
Figure RE-881780DEST_PATH_IMAGE060
And (5) dividing the site.
Further, wherein step 2 comprises:
(1) random decimation
Figure RE-745831DEST_PATH_IMAGE046
Is/are as follows
Figure RE-278312DEST_PATH_IMAGE061
One of the features
Figure RE-304037DEST_PATH_IMAGE062
Figure RE-39912DEST_PATH_IMAGE063
Randomly selecting one value of the characteristic as a boundary to divide the data set
Figure RE-707654DEST_PATH_IMAGE046
Dividing into 2 classes, recording data number of each class, and storing in list
Figure RE-94641DEST_PATH_IMAGE064
(ii) a Then randomly selecting a feature, randomly selecting the value of the feature as a boundary line for segmentation, and dividing the boundary line
Figure RE-556847DEST_PATH_IMAGE046
Dividing into 4 data sets, recording the number of each type of data
Figure RE-780018DEST_PATH_IMAGE065
(ii) a Iterating until each data point is classified into a different class, wherein
Figure RE-985871DEST_PATH_IMAGE066
Is shown as
Figure RE-227365DEST_PATH_IMAGE067
A set of sample numbers in each class after the sub-division;
(2) from the start of the segmentation, if at
Figure RE-126051DEST_PATH_IMAGE066
In
Figure RE-836518DEST_PATH_IMAGE068
The number of occupied classes is 1, i.e.
Figure RE-377221DEST_PATH_IMAGE068
Independently of class 1, then
Figure RE-489534DEST_PATH_IMAGE068
In that
Figure RE-542809DEST_PATH_IMAGE046
Is divided into a total number of times
Figure RE-740572DEST_PATH_IMAGE067
Then, the division frequency of all samples is recorded as
Figure RE-22649DEST_PATH_IMAGE069
Wherein
Figure RE-989468DEST_PATH_IMAGE070
Then
Figure RE-479224DEST_PATH_IMAGE068
Is abnormal score of
Figure RE-695442DEST_PATH_IMAGE071
Further, the adopting, based on the monitoring result, a corresponding dialogue strategy for the user includes:
if the behavior is determined to be the behavior of searching the chat robot vulnerability, submitting the user information, and converting the user information into manual processing, wherein the user information is
Figure RE-46789DEST_PATH_IMAGE072
Corresponding user information.
Further, the personal information includes one or more of an IP address, a device code, a micro signal, or a QQ number of the user.
After monitoring whether a specific event occurs in a certain number of conversation times, the embodiment of the invention adopts a corresponding conversation strategy for a user based on a monitoring result, and the conversation strategy comprises the following steps:
and if the behavior is determined to be the behavior of searching the chat robot vulnerability, submitting the user information and converting into manual processing.
Example III,
On the basis of the above embodiments, the embodiments of the present invention may further include the following:
in an application scenario, after acquiring a data set of 7 features, the next step is to find an outlier, that is, a malicious client, that is, to analyze a data set formed by 7 features in a certain number of sessions within a certain time, as shown in fig. 2, which may specifically include:
1. given a
Figure RE-868114DEST_PATH_IMAGE073
Sample data
Figure RE-13925DEST_PATH_IMAGE074
Feature dimension is 7, and sample is
Figure RE-169968DEST_PATH_IMAGE068
Figure RE-59427DEST_PATH_IMAGE075
. Dithering each discrete characteristic of each sample, and dithering
Figure RE-838DEST_PATH_IMAGE076
I.e. increase
Figure RE-583129DEST_PATH_IMAGE077
A random number in between. The dithered data set is noted
Figure RE-773939DEST_PATH_IMAGE078
. The purpose of dithering is to prevent data from overlapping. Here, the
Figure RE-716356DEST_PATH_IMAGE079
2. In that
Figure RE-246695DEST_PATH_IMAGE078
Optionally d features in common
Figure RE-734308DEST_PATH_IMAGE080
Seed combinations, the original data set is divided into m subsets, denoted as
Figure RE-611083DEST_PATH_IMAGE081
Wherein
Figure RE-107924DEST_PATH_IMAGE082
3. In each subset
Figure RE-758348DEST_PATH_IMAGE081
In (1), each sample is calculated
Figure RE-416862DEST_PATH_IMAGE083
Is abnormal score of
Figure RE-34794DEST_PATH_IMAGE084
A. Random decimation
Figure RE-804167DEST_PATH_IMAGE081
One of the d features of (1)
Figure RE-574677DEST_PATH_IMAGE085
Figure RE-404093DEST_PATH_IMAGE086
Randomly selecting boundary lines for segmentation, and collecting data
Figure RE-509321DEST_PATH_IMAGE081
Dividing into 2 classes, and recording the data number of each class
Figure RE-347964DEST_PATH_IMAGE064
(ii) a Then randomly extracting a feature, randomly selecting a boundary line for division, and dividing
Figure RE-504139DEST_PATH_IMAGE081
Dividing into 4 data sets, comparing the number of data recorded in each type
Figure RE-770035DEST_PATH_IMAGE065
. Iterate until each data point is classified into a different class. Wherein
Figure RE-847713DEST_PATH_IMAGE066
Is shown as
Figure RE-739314DEST_PATH_IMAGE087
And (5) the collection of the number of each type after the secondary segmentation.
B. Computing samples
Figure RE-953258DEST_PATH_IMAGE083
Is abnormal score of
Figure RE-390055DEST_PATH_IMAGE088
From the start of the segmentation, if at
Figure RE-955029DEST_PATH_IMAGE066
In
Figure RE-384742DEST_PATH_IMAGE083
The number of occupied classes is 1, i.e.
Figure RE-718771DEST_PATH_IMAGE083
Independently of class 1, then
Figure RE-123208DEST_PATH_IMAGE083
In that
Figure RE-175478DEST_PATH_IMAGE081
Is divided into a total number of times
Figure RE-894035DEST_PATH_IMAGE087
Then, the division frequency of all samples is recorded as
Figure RE-597418DEST_PATH_IMAGE089
Wherein
Figure RE-110438DEST_PATH_IMAGE090
Then
Figure RE-650004DEST_PATH_IMAGE083
Is abnormal score of
Figure RE-172252DEST_PATH_IMAGE091
4. Calculating a total anomaly score for each sample
Figure RE-277612DEST_PATH_IMAGE092
Figure RE-476381DEST_PATH_IMAGE093
Wherein
Figure RE-237663DEST_PATH_IMAGE094
Figure RE-563602DEST_PATH_IMAGE095
Figure RE-461151DEST_PATH_IMAGE096
Results
Figure RE-830822DEST_PATH_IMAGE097
5. Set the threshold value as
Figure RE-344980DEST_PATH_IMAGE098
Generally select
Figure RE-209030DEST_PATH_IMAGE099
Then the set of outliers is:
collection
Figure RE-23403DEST_PATH_IMAGE100
Wherein
Figure RE-580286DEST_PATH_IMAGE101
Figure RE-565428DEST_PATH_IMAGE102
Representation collection
Figure RE-967591DEST_PATH_IMAGE021
Is/are as follows
Figure RE-839732DEST_PATH_IMAGE103
And (5) dividing the site.
Specifically, the data set according to the embodiment of the present invention originally has D (7) features, and D features (where D may be selected as 2, 3, 4, etc.) are selected at a time, so that a subset of the original data set is formed, and the number of samples is unchanged, but the features are fewer.
In each subset, a feature is optionally selected one at a time, and a segmentation point for the feature is optionally selected, for segmentation. For example, the characteristics are selected: the characteristics of the age are taken as segmentation points, and further the characteristics of the age are selected as follows: age <45 is classified into one category, and age > 45 is classified into another category.
After many segmentations, each point becomes a class individually. The logic here is: the distribution of normal points is denser and the outliers are further from the normal points. The abnormal points can be classified into a class after a few segmentations. However, since there are many other points around the normal point, it is necessary to perform multiple segmentations. The fewer the number of segmentations required, the greater the probability that the sample is anomalous.
Wherein the number of division is piIs normalized, i.e. (p)i-pmin)/(pmin of p-pmin) such that the values normalized by the number of divisions are all between 0 and 1. The division number normalized value indicates the division by the minimum number of times when it is 0, and the division number normalized value indicates the division by the maximum number of times when it is 1.
Further, the anomaly score = 1-number of divisions-normalized value. Because of the anomaly score, the larger the score should be, the more anomalous. But after normalizing the values, the smaller the value is, the more abnormal, since a smaller value indicates that fewer segmentations are required. The embodiment of the present invention uses a value normalized by the number of 1-division to represent an anomaly score.
Through the above operation, the abnormality score is calculated for each point, and the value range of the abnormality score is 0 to 1. The closer it is to 0, the more normal it is. The closer to 1, the more abnormal.
The embodiment of the invention further integrates the abnormal scores of each sample of the m subsets. Specifically, the anomaly scores are summed and divided by m. The division by m is because the summed range is within [0, m ], and is therefore divided by m for normalization.
The total abnormal score of each sample is between 0 and 1 through the normalization process. And the closer to 1, the more abnormal.
By setting a threshold, then finding the quantile point of the abnormal score corresponding to the threshold. For example, the threshold is set to 95%. That is, all abnormalities within 95% are normal. Assuming that there are 100 points, the anomaly scores of 95 points are all less than a certain value. Then this value is the 95% quantile and the remaining outlier score greater than this is the outlier.
Example four,
As shown in fig. 3, the present invention also provides a system for preventing a vulnerability of a malicious search chat robot, which includes:
a request receiving module for receiving a dialogue request of a user;
the information extraction module is used for extracting personal information of the user from the conversation request and storing the personal information into a user identity database;
the real-time monitoring module is used for monitoring the chatting process in real time;
the event judging module is used for monitoring whether a specific event occurs in a certain number of conversation times;
and the strategy selection module adopts a corresponding conversation strategy for the user based on the monitoring result.
Example V,
The disclosed embodiments provide a non-volatile computer storage medium having stored thereon computer-executable instructions that may perform the method steps as described in the embodiments above.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a local Area Network (AN) or a Wide Area Network (WAN), or the connection may be made to AN external computer (for example, through the internet using AN internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of an element does not in some cases constitute a limitation on the element itself.
The foregoing describes preferred embodiments of the present invention, and is intended to provide a clear and concise description of the spirit and scope of the invention, and not to limit the same, but to include all modifications, substitutions, and alterations falling within the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. A method for preventing malicious search chat robot bugs is characterized by comprising the following steps:
receiving a dialogue request of a user;
extracting personal information of a user from the dialogue request, and storing the personal information into a user identity database, wherein the personal information comprises one or more of an IP address, a device code, a micro signal or a QQ number of the user;
monitoring the chatting process in real time;
monitoring whether a specific event occurs in a certain number of conversation times;
and adopting a corresponding conversation strategy for the user based on the monitoring result.
2. The method of claim 1, wherein said monitoring whether a particular event occurs for a number of sessions comprises:
a certain number of occurrences or rate of a feature in a certain number of sessions is counted.
3. The method of claim 2, wherein the particular feature comprises seven class features.
4. The method according to claim 3, wherein the specific feature specifically comprises:
a first class of features comprising: inquiring forward expression occurrence times of opinions and preference classes;
a second class of features comprising: inquiring negative expression occurrence times of opinions and preference classes;
a third class of features comprising: sensitive words, stars, names and times of appearance of celebrities in the conversation;
a fourth class of features comprising: the maximum number of continuous occurrences of sensitive words, stars, names and celebrities in the conversation;
a fifth class of features comprising: the cumulative number of times that the chat robot does not find a suitable reply;
a sixth class of features comprising: the frequency with which the chat robot does not find a suitable reply;
a seventh class of features comprising: the number of sentences of the chatting person inquiry service is a percentage of the number of all chatting sentences.
5. The method of claim 1, wherein monitoring a number of sessions for the occurrence of a particular event is performed by detecting an abnormal pattern to identify a normal class and a malicious class.
6. The method of claim 5, wherein said identifying a normal class and a malicious class by detecting anomalous samples comprises:
assuming D represents the number of categories of a particular feature, n customersThe dialog records being formed
Figure 672928DEST_PATH_IMAGE001
Sample data
Figure 795605DEST_PATH_IMAGE002
Figure 958733DEST_PATH_IMAGE001
The dimension of the features in the individual sample data is D, and the sample is recorded as
Figure 751239DEST_PATH_IMAGE003
Figure 734239DEST_PATH_IMAGE004
For each discrete feature of each sample, respectively
Figure 660607DEST_PATH_IMAGE005
Performing dithering, and recording the dithered data set as
Figure 615924DEST_PATH_IMAGE006
Step 1, randomly selecting D characteristics in D, and sharing the characteristics
Figure 373140DEST_PATH_IMAGE007
The combination of species, accordingly, the data set is divided into m subsets, denoted as
Figure 843436DEST_PATH_IMAGE008
Wherein
Figure 307915DEST_PATH_IMAGE009
Step 2, in each subset
Figure 914477DEST_PATH_IMAGE008
In (1), each sample is calculated
Figure 845524DEST_PATH_IMAGE003
Is abnormal score of
Figure 803116DEST_PATH_IMAGE010
Step 3, calculating the total abnormal score of each sample
Figure 946652DEST_PATH_IMAGE011
Figure 673300DEST_PATH_IMAGE012
Wherein
Figure 103144DEST_PATH_IMAGE013
Figure 548032DEST_PATH_IMAGE014
Figure 226750DEST_PATH_IMAGE015
Results
Figure 807904DEST_PATH_IMAGE016
Step 4, setting a threshold value as
Figure 674229DEST_PATH_IMAGE017
Then the set of outliers is:
collection
Figure 340834DEST_PATH_IMAGE018
Wherein
Figure 826173DEST_PATH_IMAGE019
Figure 261834DEST_PATH_IMAGE020
Representation collection
Figure 299060DEST_PATH_IMAGE021
Is/are as follows
Figure 452961DEST_PATH_IMAGE022
And (5) dividing the site.
7. The method of claim 6, wherein step 2 comprises:
random decimation
Figure 741991DEST_PATH_IMAGE023
Is/are as follows
Figure 360054DEST_PATH_IMAGE024
One of the features
Figure 505864DEST_PATH_IMAGE025
Figure 81815DEST_PATH_IMAGE026
Randomly selecting one value of the characteristic as a boundary to divide the data set
Figure 971273DEST_PATH_IMAGE027
Dividing into 2 classes, recording data number of each class, and storing in list
Figure 443843DEST_PATH_IMAGE028
(ii) a Then randomly selecting a feature, randomly selecting the value of the feature as a boundary line for segmentation, and dividing the boundary line
Figure 963817DEST_PATH_IMAGE023
Dividing into 4 data sets, recording the number of each type of data
Figure 826731DEST_PATH_IMAGE029
(ii) a Iterating until each data point is classified into a different class, wherein
Figure 582197DEST_PATH_IMAGE030
Is shown as
Figure 112536DEST_PATH_IMAGE031
A set of sample numbers in each of the sub-divided classes.
8. The method of claim 7, wherein said step 2 further comprises:
from the start of the segmentation, if at
Figure 68990DEST_PATH_IMAGE030
In
Figure 622463DEST_PATH_IMAGE032
The number of occupied classes is 1, i.e.
Figure 853724DEST_PATH_IMAGE033
Independently of class 1, then
Figure 300886DEST_PATH_IMAGE033
In that
Figure 968189DEST_PATH_IMAGE034
Is divided into a total number of times
Figure 743378DEST_PATH_IMAGE031
Then, the division frequency of all samples is recorded as
Figure 512751DEST_PATH_IMAGE035
Wherein
Figure 345578DEST_PATH_IMAGE036
Then
Figure 112677DEST_PATH_IMAGE033
Is abnormal score of
Figure 437479DEST_PATH_IMAGE037
9. The method of claim 1, wherein said employing a corresponding dialog strategy to the user based on the monitoring results comprises:
if the behavior is determined to be the behavior of searching the chat robot vulnerability, submitting the user information, and converting the user information into manual processing, wherein the user information is
Figure 72860DEST_PATH_IMAGE038
Corresponding user information.
10. A system for preventing vulnerability of a malicious search chat robot according to any of claims 1-9, comprising:
a request receiving module for receiving a dialogue request of a user;
the information extraction module is used for extracting personal information of the user from the conversation request and storing the personal information into a user identity database;
the real-time monitoring module is used for monitoring the chatting process in real time;
the event judging module is used for monitoring whether a specific event occurs in a certain number of conversation times;
and the strategy selection module adopts a corresponding conversation strategy for the user based on the monitoring result.
CN202110000300.XA 2021-01-02 2021-01-02 Method and system for preventing malicious search chat robot vulnerability Active CN112559724B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110000300.XA CN112559724B (en) 2021-01-02 2021-01-02 Method and system for preventing malicious search chat robot vulnerability

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110000300.XA CN112559724B (en) 2021-01-02 2021-01-02 Method and system for preventing malicious search chat robot vulnerability

Publications (2)

Publication Number Publication Date
CN112559724A true CN112559724A (en) 2021-03-26
CN112559724B CN112559724B (en) 2021-06-22

Family

ID=75035125

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110000300.XA Active CN112559724B (en) 2021-01-02 2021-01-02 Method and system for preventing malicious search chat robot vulnerability

Country Status (1)

Country Link
CN (1) CN112559724B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106559321A (en) * 2016-12-01 2017-04-05 竹间智能科技(上海)有限公司 The method and system of dynamic adjustment dialog strategy
CN110096578A (en) * 2019-04-08 2019-08-06 厦门快商通信息咨询有限公司 A kind of brush amount user identification method, device and the server of intelligent customer service
CN111488577A (en) * 2019-01-29 2020-08-04 北京金睛云华科技有限公司 Vulnerability exploiting method and device based on artificial intelligence
CN111653262A (en) * 2020-08-06 2020-09-11 上海荣数信息技术有限公司 Intelligent voice interaction system and method
CN111966799A (en) * 2020-07-27 2020-11-20 厦门快商通科技股份有限公司 Intelligent customer service method, customer service robot, computer equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106559321A (en) * 2016-12-01 2017-04-05 竹间智能科技(上海)有限公司 The method and system of dynamic adjustment dialog strategy
CN111488577A (en) * 2019-01-29 2020-08-04 北京金睛云华科技有限公司 Vulnerability exploiting method and device based on artificial intelligence
CN110096578A (en) * 2019-04-08 2019-08-06 厦门快商通信息咨询有限公司 A kind of brush amount user identification method, device and the server of intelligent customer service
CN111966799A (en) * 2020-07-27 2020-11-20 厦门快商通科技股份有限公司 Intelligent customer service method, customer service robot, computer equipment and storage medium
CN111653262A (en) * 2020-08-06 2020-09-11 上海荣数信息技术有限公司 Intelligent voice interaction system and method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
凌捷 等: ""边缘计算安全技术综述"", 《大数据》 *

Also Published As

Publication number Publication date
CN112559724B (en) 2021-06-22

Similar Documents

Publication Publication Date Title
CN110209790B (en) Question-answer matching method and device
CN109660533B (en) Method and device for identifying abnormal flow in real time, computer equipment and storage medium
US10635521B2 (en) Conversational problem determination based on bipartite graph
US20200125896A1 (en) Malicious software recognition apparatus and method
WO2022048170A1 (en) Method and apparatus for conducting human-machine conversation, computer device, and storage medium
US20190295098A1 (en) Performing Real-Time Analytics for Customer Care Interactions
CN113379301A (en) Method, device and equipment for classifying users through decision tree model
CN112667792B (en) Man-machine dialogue data processing method and device, computer equipment and storage medium
CN114860742A (en) Artificial intelligence-based AI customer service interaction method, device, equipment and medium
CN113850077A (en) Topic identification method, device, server and medium based on artificial intelligence
CN111510566B (en) Method and device for determining call label, computer equipment and storage medium
CN112559724B (en) Method and system for preventing malicious search chat robot vulnerability
CN113657773A (en) Method and device for testing speech technology, electronic equipment and storage medium
CN115374793B (en) Voice data processing method based on service scene recognition and related device
CN110674839B (en) Abnormal user identification method and device, storage medium and electronic equipment
US20220246153A1 (en) System and method for detecting fraudsters
CN110990554B (en) Content processing method, device, electronic equipment and medium
US11064072B1 (en) Caller state determination
CN110909538A (en) Question and answer content identification method and device, terminal equipment and medium
CN112148864B (en) Voice interaction method and device, computer equipment and storage medium
CN114884740B (en) AI-based intrusion protection response data processing method and server
CN114756401B (en) Abnormal node detection method, device, equipment and medium based on log
CN112800321B (en) Ambiguous post identification method based on keyword retrieval and computer equipment
CN110826339B (en) Behavior recognition method, behavior recognition device, electronic equipment and medium
WO2024027127A1 (en) Fault detection method and apparatus, and electronic device and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant