CN117093667A

CN117093667A - Abnormality detection method and related equipment

Info

Publication number: CN117093667A
Application number: CN202210507874.0A
Authority: CN
Inventors: 费志辉; 万明阳; 薛驰; 马国俊
Original assignee: Beijing Zitiao Network Technology Co Ltd
Current assignee: Beijing Zitiao Network Technology Co Ltd
Priority date: 2022-05-10
Filing date: 2022-05-10
Publication date: 2023-11-21

Abstract

The disclosure provides an anomaly detection method and related equipment. The method comprises the following steps: receiving target feedback information; extracting a first vectorized representation of the target feedback information; determining at least one second vectorized representation based on the first vectorized representation and historical feedback information, the second vectorized representation having a similarity to the first vectorized representation greater than a preset similarity threshold; determining timing information of the target feedback information based on the target feedback information and feedback information corresponding to the at least one second vectorized representation; extracting at least one keyword of the target feedback information; and determining an abnormality detection result based on the at least one keyword of the target feedback information and the timing information.

Description

Anomaly detection methods and related equipment

技术领域Technical field

本公开涉及计算机技术领域，尤其涉及一种异常检测方法及相关设备。The present disclosure relates to the field of computer technology, and in particular, to an anomaly detection method and related equipment.

背景技术Background technique

异常发现算法是识别客诉数据流中热点异常话题的一种计算机技术，它是借用自然语言处理的方法，将相同主题的反馈文本聚集在一起，判断它是否是一个需要进行关注的异常问题。但是，本公开的发明人发现，现有的聚类方法难以实现实时准确的异常检测。The anomaly discovery algorithm is a computer technology that identifies hot abnormal topics in customer complaint data streams. It borrows natural language processing methods to gather feedback texts on the same topic together to determine whether it is an abnormal issue that requires attention. However, the inventors of the present disclosure found that it is difficult for existing clustering methods to achieve real-time and accurate anomaly detection.

发明内容Contents of the invention

本公开提出一种异常检测方法及相关设备，以解决或部分解决上述的问题。The present disclosure proposes an anomaly detection method and related equipment to solve or partially solve the above problems.

本公开第一方面，提供了一种异常检测方法，包括：A first aspect of this disclosure provides an anomaly detection method, including:

接收目标反馈信息；Receive target feedback information;

提取所述目标反馈信息的第一向量化表示；Extracting a first vectorized representation of the target feedback information;

基于所述第一向量化表示和历史反馈信息确定至少一个得到第二向量化表示，所述第二向量化表示与所述第一向量化表示的相似度大于预设相似度阈值；Determine at least one second vectorized representation based on the first vectorized representation and historical feedback information, and the similarity between the second vectorized representation and the first vectorized representation is greater than a preset similarity threshold;

基于所述目标反馈信息以及所述至少一个第二向量化表示对应的反馈信息，得到所述目标反馈信息的时序信息；Obtain timing information of the target feedback information based on the target feedback information and the feedback information corresponding to the at least one second vectorized representation;

提取所述目标反馈信息的至少一个关键词；以及Extract at least one keyword of the target feedback information; and

基于所述目标反馈信息的所述至少一个关键词和所述时序信息，确定异常检测结果。An anomaly detection result is determined based on the at least one keyword of the target feedback information and the timing information.

本公开第二方面，提供了一种异常检测装置，包括：A second aspect of the present disclosure provides an anomaly detection device, including:

接收模块，被配置为：接收目标反馈信息；The receiving module is configured to: receive target feedback information;

向量提取模块，被配置为：提取所述目标反馈信息的第一向量化表示；A vector extraction module configured to: extract a first vectorized representation of the target feedback information;

信息提取模块，被配置为：基于所述第一向量化表示和历史反馈信息确定至少一个第二向量化表示，所述第二向量化表示与所述第一向量化表示的相似度大于预设相似度阈值；基于所述目标反馈信息以及所述至少一个第二向量化表示对应的反馈信息，得到所述目标反馈信息的时序信息；以及，提取所述目标反馈信息的至少一个关键词；以及an information extraction module configured to: determine at least one second vectorized representation based on the first vectorized representation and historical feedback information, where the similarity between the second vectorized representation and the first vectorized representation is greater than a preset Similarity threshold; based on the target feedback information and the feedback information corresponding to the at least one second vectorized representation, obtain the timing information of the target feedback information; and extract at least one keyword of the target feedback information; and

检测模块，被配置为：基于所述目标反馈信息的所述至少一个关键词和所述时序信息，确定异常检测结果。The detection module is configured to: determine an anomaly detection result based on the at least one keyword of the target feedback information and the timing information.

本公开第三方面，提供了一种计算机设备，包括一个或者多个处理器、存储器；和一个或多个程序，其中所述一个或多个程序被存储在所述存储器中，并且被所述一个或多个处理器执行，所述程序包括用于执行根据第一方面所述的方法的指令。A third aspect of the present disclosure provides a computer device, including one or more processors, a memory; and one or more programs, wherein the one or more programs are stored in the memory, and are Executed by one or more processors, the program includes instructions for performing the method according to the first aspect.

本公开第四方面，提供了一种包含计算机程序的非易失性计算机可读存储介质，当所述计算机程序被一个或多个处理器执行时，使得所述处理器执行第一方面所述的方法。A fourth aspect of the present disclosure provides a non-volatile computer-readable storage medium containing a computer program. When the computer program is executed by one or more processors, the processor causes the processor to execute the first aspect. Methods.

本公开第五方面，提供了一种计算机程序产品，包括计算机程序指令，当所述计算机程序指令在计算机上运行时，使得计算机执行第一方面所述的方法。A fifth aspect of the present disclosure provides a computer program product, which includes computer program instructions. When the computer program instructions are run on a computer, they cause the computer to execute the method described in the first aspect.

本公开提供的异常检测方法及相关设备，基于与目标反馈信息的向量化表示相似的向量化表示来生成相应的时序信息，然后将目标反馈信息的关键词和该时序信息结合到一起来进行异常检测，一方面能够实现异常实时检测，另一方面能够同时利用关键词特征和时序特征进行异常检测，从而得到更准确的异常检测结果。The anomaly detection method and related equipment provided by the present disclosure generate corresponding timing information based on a vectorized representation similar to the vectorized representation of target feedback information, and then combine the keywords of the target feedback information and the timing information to detect abnormalities. Detection, on the one hand, can realize real-time detection of anomalies, on the other hand, it can simultaneously use keyword features and timing features for anomaly detection, thereby obtaining more accurate anomaly detection results.

附图说明Description of the drawings

为了更清楚地说明本公开或相关技术中的技术方案，下面将对实施例或相关技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本公开的实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the present disclosure or related technologies, the drawings needed to be used in the description of the embodiments or related technologies will be briefly introduced below. Obviously, the drawings in the following description are only for illustration of the present disclosure. Embodiments, for those of ordinary skill in the art, other drawings can also be obtained based on these drawings without exerting creative efforts.

图1示出了本公开实施例所提供的一种示例性系统的示意图。FIG. 1 shows a schematic diagram of an exemplary system provided by an embodiment of the present disclosure.

图2A示出了根据本公开实施例的示例性文本表示模型的示意图。Figure 2A shows a schematic diagram of an exemplary text representation model according to an embodiment of the present disclosure.

图2B示出了根据本公开实施例的示例性时序直方图的示意图。Figure 2B shows a schematic diagram of an exemplary timing histogram according to an embodiment of the present disclosure.

图2C示出了根据本公开实施例的另一示例性时序直方图的示意图。FIG. 2C shows a schematic diagram of another exemplary timing histogram according to an embodiment of the present disclosure.

图2D示出了根据本公开实施例的示例性异常检测模型的示意图。Figure 2D shows a schematic diagram of an exemplary anomaly detection model according to an embodiment of the present disclosure.

图3示出了本公开实施例所提供的一种示例性方法的流程示意图。FIG. 3 shows a schematic flowchart of an exemplary method provided by an embodiment of the present disclosure.

图4示出了本公开实施例所提供的示例性计算机设备的硬件结构示意图。FIG. 4 shows a schematic diagram of the hardware structure of an exemplary computer device provided by an embodiment of the present disclosure.

图5示出了本公开实施例所提供的一种示例性装置的示意图。FIG. 5 shows a schematic diagram of an exemplary device provided by an embodiment of the present disclosure.

具体实施方式Detailed ways

为使本公开的目的、技术方案和优点更加清楚明白，以下结合具体实施例，并参照附图，对本公开进一步详细说明。In order to make the purpose, technical solutions and advantages of the present disclosure more clear, the present disclosure will be further described in detail below in conjunction with specific embodiments and with reference to the accompanying drawings.

需要说明的是，除非另外定义，本公开实施例使用的技术术语或者科学术语应当为本公开所属领域内具有一般技能的人士所理解的通常意义。本公开实施例中使用的“第一”、“第二”以及类似的词语并不表示任何顺序、数量或者重要性，而只是用来区分不同的组成部分。“包括”或者“包含”等类似的词语意指出现该词前面的元件或者物件涵盖出现在该词后面列举的元件或者物件及其等同，而不排除其他元件或者物件。“连接”或者“相连”等类似的词语并非限定于物理的或者机械的连接，而是可以包括电性的连接，不管是直接的还是间接的。“上”、“下”、“左”、“右”等仅用于表示相对位置关系，当被描述对象的绝对位置改变后，则该相对位置关系也可能相应地改变。It should be noted that, unless otherwise defined, the technical terms or scientific terms used in the embodiments of this disclosure should have the usual meanings understood by those with ordinary skills in the field to which this disclosure belongs. The "first", "second" and similar words used in the embodiments of the present disclosure do not indicate any order, quantity or importance, but are only used to distinguish different components. Words such as "include" or "comprising" mean that the elements or things appearing before the word include the elements or things listed after the word and their equivalents, without excluding other elements or things. Words such as "connected" or "connected" are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "Up", "down", "left", "right", etc. are only used to express relative positional relationships. When the absolute position of the described object changes, the relative positional relationship may also change accordingly.

客服领域中的异常发现算法是识别客诉数据流中热点异常话题的一种计算机技术，它是借用自然语言处理的方法，将相同主题的反馈文本聚集在一起，判断它是否是一个需要进行关注的异常问题。The anomaly discovery algorithm in the field of customer service is a computer technology that identifies hot abnormal topics in the customer complaint data stream. It borrows natural language processing methods to gather feedback texts on the same topic together to determine whether it is an issue that requires attention. abnormal problem.

例如，在某平台的用户反馈场景中，及时发现反馈数据中的热点异常问题，对视频信息流产品的运营和维护具有极其重要的作用。For example, in the user feedback scenario of a certain platform, timely discovery of hot abnormal issues in the feedback data is extremely important for the operation and maintenance of video information streaming products.

一种可能的热点(异常)发现算法主要依赖的是文本聚类技术。由于文本聚类是一种无监督的方法，聚类算法效果通常情况效果非常有限，类别粒度难以控制(例如，由于不同问题不同时间段的情况不同，聚类时用于确定异常问题的阈值难以固定)，通过聚类算法挖掘热点问题准确性不高，且聚类算法实时性难以保证，难以对线上异常问题进行及时捕获。One possible hotspot (anomaly) discovery algorithm mainly relies on text clustering technology. Since text clustering is an unsupervised method, the effect of the clustering algorithm is usually very limited, and the category granularity is difficult to control (for example, due to the different situations of different problems in different time periods, the threshold used to determine abnormal problems during clustering is difficult to control). Fixed), the accuracy of mining hot issues through clustering algorithm is not high, and the real-time performance of the clustering algorithm is difficult to guarantee, making it difficult to timely capture online abnormal issues.

例如，在某平台的用户反馈场景中，相同问题的不同用户反馈在表述上可能差异较大，但是在语义信息上可能大致相同，无监督向量化表示方法也难以学习到不同表述文本是否反馈的是同一个问题，所以也难以满足用户反馈场景下热点发现的需求。For example, in a user feedback scenario on a certain platform, different user feedback on the same issue may be very different in expression, but may have roughly the same semantic information. It is also difficult for unsupervised vector representation methods to learn whether different expression texts are feedback. It is the same problem, so it is difficult to meet the needs of hot spot discovery in user feedback scenarios.

鉴于此，本公开实施例提供了一种异常检测方法及相关设备，基于与目标反馈信息的向量化表示相似的向量化表示来生成相应的时序信息，然后将目标反馈信息的关键词和该时序信息结合到一起来进行异常检测，一方面能够实现异常实时检测，另一方面能够同时利用关键词特征和时序特征进行异常检测，从而得到更准确的异常检测结果。In view of this, embodiments of the present disclosure provide an anomaly detection method and related equipment, which generate corresponding timing information based on a vectorized representation similar to the vectorized representation of target feedback information, and then combine the keywords of the target feedback information with the timing The information is combined for anomaly detection. On the one hand, it can achieve real-time detection of anomalies. On the other hand, it can simultaneously use keyword features and timing features for anomaly detection, thereby obtaining more accurate anomaly detection results.

图1示出了本公开实施例所提供的示例性系统100的示意图。FIG. 1 shows a schematic diagram of an exemplary system 100 provided by an embodiment of the present disclosure.

该系统100可以用于对目标反馈信息(例如，用户通过平台的客服入口反馈的意见)进行处理并基于目标反馈信息来进行异常检测。如图1所示，该系统100可以包括服务器200和终端设备300。The system 100 can be used to process target feedback information (for example, user feedback through the platform's customer service portal) and perform anomaly detection based on the target feedback information. As shown in Figure 1, the system 100 may include a server 200 and a terminal device 300.

服务器200可以是企业内部布设的服务器或者企业购买或租用的商业服务器。服务器200的数量可以是一台或多台，在服务器200是多台的情况下，可以采用分布式架构形成服务器集群。在一些实施例中，如图1所示，该系统100还可以进一步包括数据库服务器202，该数据库服务器202可以用于存储数据，并且，服务器200可以根据需要从该数据库服务器202中调用相应的数据。在一些实施例中，该数据库服务器202可以是ElasticSearch集群。The server 200 may be a server deployed within the enterprise or a commercial server purchased or rented by the enterprise. The number of servers 200 may be one or more. When there are multiple servers 200 , a distributed architecture may be used to form a server cluster. In some embodiments, as shown in Figure 1, the system 100 may further include a database server 202, which may be used to store data, and the server 200 may call corresponding data from the database server 202 as needed. . In some embodiments, the database server 202 may be an ElasticSearch cluster.

终端设备300可以是各种类型的固定终端或移动终端。例如，终端设备300可以是手机、平板电脑、个人电脑、笔记本电脑等设备。终端设备300与服务器200可以通过有线网络或无线网络进行通信，从而实现数据交互。The terminal device 300 may be various types of fixed terminals or mobile terminals. For example, the terminal device 300 may be a mobile phone, a tablet computer, a personal computer, a notebook computer, and other devices. The terminal device 300 and the server 200 can communicate through a wired network or a wireless network to achieve data interaction.

在一些实施例中，用户可以利用终端设备300向服务器200发送一些反馈信息。例如，用户可以通过终端设备300中安装的应用程序(APP)的用户反馈入口(例如，APP中的意见反馈界面)输入一些反馈信息，并由终端设备300将这些反馈信息发送服务器200。In some embodiments, the user can use the terminal device 300 to send some feedback information to the server 200 . For example, the user can input some feedback information through the user feedback entrance (for example, the feedback interface in the APP) of the application program (APP) installed in the terminal device 300, and the terminal device 300 sends the feedback information to the server 200.

服务器200接收到这些反馈信息，可以先对这些反馈信息进行预处理，以用于进行异常检测。After receiving the feedback information, the server 200 may first preprocess the feedback information for abnormality detection.

在一些实施例中，服务器200可以对接收的每个反馈信息都进行处理，例如，将反馈信息的文本进行向量化表示。In some embodiments, the server 200 may process each feedback information received, for example, vectorize the text of the feedback information.

对反馈信息的文本进行向量化表示的方法可以有很多。在一些实施例中，可以在服务器200中构建文本表示模型来对反馈信息的文本进行向量化表示。该文本表示模型可以采用各种用于文本处理的算法模型。在一些实施例中，可以采用BERT模型。There are many ways to vectorize the text of feedback information. In some embodiments, a text representation model may be constructed in the server 200 to vectorize the text of the feedback information. The text representation model can adopt various algorithm models for text processing. In some embodiments, a BERT model may be employed.

图2A示出了根据本公开实施例的示例性文本表示模型204的示意图。Figure 2A shows a schematic diagram of an exemplary text representation model 204 in accordance with an embodiment of the present disclosure.

作为一个可选实施例，该文本表示模型可以用预训练模型2042来作为模型基础，然后利用目标平台(待进行异常检测的平台)的历史反馈信息(例如，该目标平台在一段历史时间内(例如，1年内、半年内、三个月内，等等)收集的用户的反馈信息)来继续训练该预训练模型2042，从而得到文本表示模型2044。其中，预训练模型2042可以利用开源平台提供的已经预训练好的BERT模型，或者，搭建一个初始化的BERT模型，然后利用开源的语料库来对该初始化模型进行预训练从而得到预训练好的BERT模型。As an optional embodiment, the text representation model can use the pre-trained model 2042 as the model basis, and then use the historical feedback information of the target platform (the platform to be anomaly detected) (for example, the target platform within a historical period ( For example, user feedback information collected within one year, within half a year, within three months, etc.) to continue training the pre-training model 2042, thereby obtaining a text representation model 2044. Among them, the pre-training model 2042 can use the pre-trained BERT model provided by the open source platform, or build an initialized BERT model, and then use the open source corpus to pre-train the initialized model to obtain the pre-trained BERT model. .

继续训练该预训练模型2042，可以采用掩码语言模型(Mask Language Model，简称MLM)的方法来实现。Continuing to train the pre-training model 2042 can be implemented by using a Mask Language Model (MLM for short) method.

在一些实施例中，利用多个历史反馈信息来形成训练数据集，可以将每个历史反馈信息处理得到训练样本。由于掩码语言模型机制需要对文本利用掩码标识符进行掩码处理，因此，可以对每个历史反馈信息进行掩码处理来得到带掩码标识符的历史反馈信息。In some embodiments, multiple historical feedback information are used to form a training data set, and each historical feedback information can be processed to obtain training samples. Since the mask language model mechanism needs to mask the text using a mask identifier, each historical feedback information can be masked to obtain historical feedback information with a mask identifier.

进一步地，为了帮助模型学习到“相同标签的反馈信息在向量空间中距离较近，不同标签的反馈信息在向量空间中距离较远”这种语义信息，可以对带掩码标识符的历史反馈信息打上相应的分类标签。Furthermore, in order to help the model learn the semantic information that "feedback information with the same label is close in the vector space, feedback information with different labels is far apart in the vector space", historical feedback with masked identifiers can be Information is labeled with corresponding classification labels.

当不同用户的反馈信息的文本表达存在较大的差异性时，传统的无监督文本表征方法难以准确获得反馈文本的深层次语义信息。具体例子如表1所示。When there are large differences in the text expressions of feedback information from different users, it is difficult for traditional unsupervised text representation methods to accurately obtain the deep semantic information of the feedback text. Specific examples are shown in Table 1.

表1Table 1

用户反馈文本User feedback text 标签Label 才刷了一会儿，手机发热很严重I’ve only been using it for a while, but my phone is getting very hot. 功能故障-手机发烫Functional failure - phone gets hot 刚升级，手机就发烫了，怎么回事I just upgraded and my phone got hot. What's going on? 功能故障-手机发烫Functional failure - phone gets hot 你们产品怎么回事，玩一会手机烫死了What's wrong with your product? My phone burned to death after playing with it for a while. 功能故障-手机发烫Functional failure - phone gets hot 怎么尽给我推荐不喜欢的视频How can you recommend videos that I don’t like? 生态画风-推荐不准Ecological style - not recommended

由此可见，若使用反馈信息携带的标签信息，能够更好地帮助文本表示模型学习句子的语义信息。因此，作为一个较佳实施例，该分类标签可以是与所述历史反馈信息相关联的标签。例如，用户在提供该反馈信息时该反馈信息所携带的标签。该标签可以是用户自己选择的标签、线上自然语言处理(NLP)分类模型打上的标签，或者人工纠偏生成的标签，等等。It can be seen that using the label information carried by the feedback information can better help the text representation model learn the semantic information of the sentence. Therefore, as a preferred embodiment, the classification label may be a label associated with the historical feedback information. For example, the tag carried by the feedback information when the user provides the feedback information. The label can be a label selected by the user, a label assigned by an online natural language processing (NLP) classification model, or a label generated by manual correction, etc.

一般地，在MLM机制中，会从文本中随机选择一定比例(例如，15％)的词(或字)来进行掩码处理(使用[Mask]符号对原始词进行替换)，然后基于BERT模型提取带掩码标识符的反馈文本信息，进而预测被掩码的词的信息。为了增强文本表示模型2044对反馈文本关键信息的表示，在一些实施例中，可以增大对文本中关键词的掩码概率。作为一个可选实施例，可以利用关键词提取算法(TF-IDF)来设计该掩码概率。例如，对历史反馈信息中的每个词计算TF-IDF得分，然后基于历史反馈信息中的所有词的TF-IDF得分进行归一化，得到每个词的权重，然后以该权重来作为从历史反馈信息选择词来进行掩码处理的概率。由于TF-IDF得分反映了一个文本集中一个词对某份文本的重要程度，使得这样的处理可以增大对关键词进行掩码的概率，进而能够增强模型对反馈文本关键信息的表示。可以理解，计算TF-IDF得分所使用的文本集选用目标平台(待进行异常检测的平台)的历史反馈信息集合，可以进一步提高模型对反馈信息的关键信息的表示。Generally, in the MLM mechanism, a certain proportion (for example, 15%) of words (or characters) are randomly selected from the text for masking processing (the [Mask] symbol is used to replace the original words), and then based on the BERT model Extract feedback text information with masked identifiers, and then predict the information of the masked words. In order to enhance the text representation model 2044's representation of key information of the feedback text, in some embodiments, the masking probability of keywords in the text can be increased. As an optional embodiment, the keyword extraction algorithm (TF-IDF) can be used to design the mask probability. For example, calculate the TF-IDF score for each word in the historical feedback information, and then normalize it based on the TF-IDF scores of all words in the historical feedback information to obtain the weight of each word, and then use this weight as the starting point. The probability of selecting words for masking based on historical feedback information. Since the TF-IDF score reflects the importance of a word in a text set to a certain text, such processing can increase the probability of masking keywords, thereby enhancing the model's representation of key information in the feedback text. It can be understood that the text set used to calculate the TF-IDF score selects the historical feedback information set of the target platform (the platform to be anomaly detected), which can further improve the model's representation of key information of the feedback information.

在得到训练数据集之后，可以利用该训练数据集基于MLM机制来对预训练模型2042继续训练，从而得到文本表示模型2044，后续则可以利用文本表示模型2044来对实时接收的目标反馈信息进行向量化表示。After obtaining the training data set, the training data set can be used to continue training the pre-training model 2042 based on the MLM mechanism, thereby obtaining the text representation model 2044. Subsequently, the text representation model 2044 can be used to vectorize the target feedback information received in real time. representation.

在一些实施例中，服务器200可以利用该文本表示模型2044将每一条接收的反馈信息处理得到向量化表示，然后存储在数据库服务器202(例如，ElasticSearch集群)中，以供后续提取反馈信息对应的时序特征。相应地，用户的反馈信息也可以都存储在数据库服务器202中。In some embodiments, the server 200 can use the text representation model 2044 to process each piece of received feedback information to obtain a vectorized representation, and then store it in the database server 202 (for example, an ElasticSearch cluster) for subsequent extraction of the corresponding feedback information. Timing characteristics. Correspondingly, the user's feedback information may also be stored in the database server 202 .

回到图1，针对实时接收的目标反馈信息302，服务器200可以先利用该文本表示模型2044提取该目标反馈信息302的第一向量化表示。Returning to FIG. 1 , for the target feedback information 302 received in real time, the server 200 may first use the text representation model 2044 to extract a first vectorized representation of the target feedback information 302 .

然后，服务器200可以基于第一向量化表示，得到多个与第一向量化表示相似的第二向量化表示。例如，将该第一向量化表示输入到数据库服务器202(例如，ElasticSearch集群)的向量检索引擎(例如ByteES、Faiss等)进行检索，得到多个与所述第一向量化表示的相似度大于预设相似度阈值(例如，余弦相似度大于0.92)的第二向量化表示。Then, the server 200 may obtain a plurality of second vectorized representations similar to the first vectorized representation based on the first vectorized representation. For example, the first vectorized representation is input to a vector retrieval engine (such as ByteES, Faiss, etc.) of the database server 202 (such as an ElasticSearch cluster) for retrieval, and a plurality of vectorized representations whose similarity to the first vectorized representation is greater than predetermined are obtained. Let the second vectorized representation of the similarity threshold (eg, cosine similarity greater than 0.92).

接着，服务器200可以基于这些第二向量化表示，从数据库服务器202中获取到每个第二向量化表示对应的反馈信息。Then, the server 200 can obtain feedback information corresponding to each second vectorized representation from the database server 202 based on these second vectorized representations.

例如，用户提交的是一个常见的反馈问题——“为什么会卡顿？”。通过该文本表示模型2044将该反馈问题进行向量化表示后，利用向量检索引擎可以检索到一些相似的向量化表示，然后，基于这些相似的向量化表示，服务器200可以获取到与“为什么会卡顿？”相似的反馈信息。For example, users submitted a common feedback question - "Why does it freeze?". After vectorizing the feedback question through the text representation model 2044, some similar vectorized representations can be retrieved using a vector retrieval engine. Then, based on these similar vectorized representations, the server 200 can obtain information related to "Why is it stuck?" Pause?" Similar feedback information.

可以认为这些相似的反馈信息描述的都是同一个问题，然后根据每一条反馈信息的用户提交时间，以预设时间间隔(例如，10分钟、半小时、1小时等)为单位，画出用户的反馈信息的时序直方图206a作为该反馈信息的时序信息，例如，如图2B所示。It can be considered that these similar feedback messages all describe the same problem, and then based on the user submission time of each feedback message, users are drawn in preset time intervals (for example, 10 minutes, half an hour, 1 hour, etc.) The timing histogram 206a of the feedback information is used as the timing information of the feedback information, for example, as shown in FIG. 2B.

又例如，如果用户反馈的是一个异常问题——“刷一会儿就手机发热”。服务器200也可以采用同样的方式对该反馈信息进行文本向量化、存入向量检索库、进行向量检索等操作。然后得到如表2所示的相似反馈。其时序直方图206b可能如图2C所示。For another example, if the user feedback is an abnormal problem - "the phone gets hot after brushing for a while". The server 200 can also use the same method to perform text vectorization on the feedback information, store it in a vector retrieval database, and perform vector retrieval and other operations. Then get similar feedback as shown in Table 2. The timing histogram 206b may be as shown in Figure 2C.

表2Table 2

用户提交时间User submission time 用户反馈customer feedback 2021 12-24 17:30:412021 12-24 17:30:41 才刷了一会儿，手机发热很严重I’ve only been using it for a while, but my phone is getting very hot. 2021 12-24 17:30:562021 12-24 17:30:56 刚升级，手机就发烫了，怎么回事I just upgraded and my phone got hot. What's going on? 2021 12-24 17:31:112021 12-24 17:31:11 你们产品怎么回事，玩一会手机烫死了What's wrong with your product? My phone burned to death after playing with it for a while. 2021 12-24 17:31:382021 12-24 17:31:38 手机太烫了The phone is too hot 2021 12-24 17:32:162021 12-24 17:32:16 手机发热很严重The phone is very hot

从图2C所示可以看出，异常问题的特点是在前期呈周期性波动，但是在近期正在出现一个异常的量级。从时序直方图206b来看，“刷一会儿就手机发热”这个问题在正常情况下每10分钟量级一般低于5，但是在近期，每10分钟量级达到25，则表示可能正在出现“手机发烫”的异常。As shown in Figure 2C, it can be seen that the abnormal problem is characterized by cyclical fluctuations in the early stage, but an abnormal magnitude is emerging in the near future. Judging from the timing histogram 206b, the problem of "the phone becomes hot after a while of swiping" is generally lower than 5 every 10 minutes under normal circumstances. However, recently, the magnitude has reached 25 every 10 minutes, which means that "mobile phones may become hot". "Hot" abnormality.

由此可见，时序直方图体现了一个反馈信息对应的问题在一段时间内的时序特征。It can be seen that the timing histogram reflects the timing characteristics of a problem corresponding to feedback information over a period of time.

接着，服务器200可以提取所述目标反馈信息302的关键词，然后，基于所述目标反馈信息302的所述关键词和所述时序信息，得到异常检测结果。Next, the server 200 may extract the keywords of the target feedback information 302, and then obtain an anomaly detection result based on the keywords of the target feedback information 302 and the timing information.

提取关键词的方式可以有很多种，一些常见的关键词提取方法都可以用于提取所述目标反馈信息302的关键词。例如，可以采用TF-IDF、TextRank、LDA等算法来提取关键词。作为一个可选实施例，使用TF-IDF算法计算每个词(或字)的TF-IDF得分，然后将得分较高的词作为关键词。例如，可以选择得分最高和第二高的词作为关键词，或者，将得分高于分数阈值的词作为关键词。为了保证能够正常提取关键词，该分数阈值需设定得能够尽量满足每个反馈信息都能提取到关键词，或者，当没有满足分数阈值的词时，可以选择得分最高的词作为关键词。There are many ways to extract keywords, and some common keyword extraction methods can be used to extract keywords from the target feedback information 302. For example, algorithms such as TF-IDF, TextRank, and LDA can be used to extract keywords. As an optional embodiment, use the TF-IDF algorithm to calculate the TF-IDF score of each word (or character), and then use the word with a higher score as a keyword. For example, the words with the highest and second highest scores can be selected as keywords, or the words with scores above a score threshold can be selected as keywords. In order to ensure that keywords can be extracted normally, the score threshold needs to be set so that keywords can be extracted from each feedback information. Alternatively, when there is no word that meets the score threshold, the word with the highest score can be selected as the keyword.

在一些实施例中，服务器200可以将所述关键词和所述时序信息输入异常检测模型，进而由异常检测模型输出所述异常检测结果。In some embodiments, the server 200 can input the keywords and the timing information into an anomaly detection model, and then the anomaly detection model outputs the anomaly detection results.

在实际异常问题检测中，不同的问题，在不同的时间，需要有不同的阈值(反应该问题为异常问题的阈值)。因此，不能采用一个固定阈值，而且阈值需要结合具体问题的内容(关键词)、时间特征(直方图趋势)才能决定。所以，本实施例采用了一个基于关键词和时序信息的问题分类模型作为异常检测模型，从而直接预测候选问题(反馈信息302)是否是一个异常问题。In actual abnormal problem detection, different problems require different thresholds at different times (thresholds that reflect the problem as an abnormal problem). Therefore, a fixed threshold cannot be used, and the threshold needs to be determined based on the content (keywords) and time characteristics (histogram trend) of the specific question. Therefore, this embodiment uses a question classification model based on keywords and time series information as an anomaly detection model to directly predict whether the candidate question (feedback information 302) is an anomaly question.

图2D示出了根据本公开实施例的示例性异常检测模型208的示意图。Figure 2D shows a schematic diagram of an exemplary anomaly detection model 208 in accordance with an embodiment of the present disclosure.

如图2D所示，异常检测模型208可以包括关键词特征提取层2082、时序特征提取层2084和分类层2086。As shown in FIG. 2D , the anomaly detection model 208 may include a keyword feature extraction layer 2082 , a temporal feature extraction layer 2084 and a classification layer 2086 .

关键词特征提取层2082可以从输入到异常检测模型208的关键词中提取相应的关键词特征。在一些实施例中，该关键词特征提取层2082可以是词嵌入层(word embedding)。The keyword feature extraction layer 2082 may extract corresponding keyword features from the keywords input to the anomaly detection model 208 . In some embodiments, the keyword feature extraction layer 2082 may be a word embedding layer.

时序特征提取层2084可以从输入到异常检测模型208的时序信息(例如，时序直方图)中提取相应的时序特征。在一些实施例中，该时序特征提取层2084可以是由长短期记忆网络(LSTM)构成的。The temporal feature extraction layer 2084 may extract corresponding temporal features from the temporal information (eg, temporal histogram) input to the anomaly detection model 208. In some embodiments, the temporal feature extraction layer 2084 may be composed of a long short-term memory network (LSTM).

分类层2086可以将输入到其中的关键词特征和时序特征拼接后得到候选异常的整体特征，最后预测该目标反馈信息302的类别，从而得到异常检测结果。例如，Yes表示该目标反馈信息302所表达的问题是异常问题，No表示该目标反馈信息302所表达的问题不是异常问题。在一些实施例中，该分类层2086可以是全连接神经网络模型，其中的隐藏层的数量和各层的神经元数量均可以根据需要进行设置。The classification layer 2086 can splice the keyword features and time series features input therein to obtain the overall features of the candidate anomaly, and finally predict the category of the target feedback information 302 to obtain the anomaly detection result. For example, Yes indicates that the problem expressed by the target feedback information 302 is an abnormal problem, and No indicates that the problem expressed by the target feedback information 302 is not an abnormal problem. In some embodiments, the classification layer 2086 can be a fully connected neural network model, in which the number of hidden layers and the number of neurons in each layer can be set as needed.

可以看出，异常检测模型208可以主要包含两种特征的输入：a)关键词特征用来描述候选问题的内容信息；b)时序特征用来描述候选问题发生的时序特征信息。异常检测模型208基于这两种特征，进行拼接，然后在全连接神经网络层进行信息融合，最后预测该候选问题是否为异常问题。It can be seen that the anomaly detection model 208 can mainly include the input of two types of features: a) keyword features are used to describe the content information of the candidate questions; b) timing features are used to describe the timing feature information of the occurrence of the candidate questions. The anomaly detection model 208 performs splicing based on these two features, then performs information fusion at the fully connected neural network layer, and finally predicts whether the candidate question is an anomaly.

在一些实施例中，异常检测模型208也可以是基于历史反馈信息训练到的。作为一个可选实施例，异常检测模型208的训练样本集可以包括多个训练样本，这些训练样本可以进一步包括正例样本(属于历史事故)和负例样本(不属于历史事故)。其中，正例样本与负例样本的比例，例如，可以是1：10。In some embodiments, the anomaly detection model 208 may also be trained based on historical feedback information. As an optional embodiment, the training sample set of the anomaly detection model 208 may include multiple training samples, and these training samples may further include positive samples (belonging to historical accidents) and negative samples (not belonging to historical accidents). Among them, the ratio of positive samples to negative samples can be, for example, 1:10.

其中，正例样本为历史反馈信息中的异常反馈信息(例如，为什么手机发热厉害)。在大多数情况下，历史事故或异常一般都会有存档，例如事故标签、事故描述、事故相关反馈、事故发生时间及趋势等。基于这些存档数据，可以人工的构建事故相关的关键词与事故的时序直方图特征，即收集一段时间的历史事故数据并打上正例标签作为模型训练的正例样本。Among them, the positive samples are abnormal feedback information in the historical feedback information (for example, why the mobile phone is so hot). In most cases, historical accidents or anomalies are generally archived, such as accident tags, accident descriptions, accident-related feedback, accident occurrence time and trends, etc. Based on these archived data, accident-related keywords and time series histogram features of the accident can be manually constructed, that is, historical accident data for a period of time are collected and labeled as positive examples as positive examples for model training.

负例样本为从历史反馈信息中随机选择的反馈信息并打上负例标签。由于随机抽取的反馈信息，属于事故或异常反馈的概率是极小的，因此可以作为模型训练的负例样本，这样处理也能更加简单易行、提高效率。Negative example samples are feedback information randomly selected from historical feedback information and labeled as negative examples. Since randomly selected feedback information has a very small probability of being an accident or abnormal feedback, it can be used as a negative sample for model training, making the processing easier and more efficient.

然后，服务器200可以基于向量检索引擎找到它相似的反馈信息，然后统计这些相似反馈信息的关键词与时序直方图。Then, the server 200 can find its similar feedback information based on the vector retrieval engine, and then count the keywords and time series histograms of these similar feedback information.

接着，便可以利用这些关键词和时序直方图来训练异常检测模型。基于预测的类别与训练数据的真实类别，构建交叉熵损失，基于梯度下降方法对模型参数进行优化，直至收敛，从而完成模型训练，得到最终的异常检测模型208。这样，通过将目标反馈信息302的关键词和时序信息(例如，时序直方图)输入到异常检测模型208中，就可以得出是否存在异常的检测结果。Then, these keywords and time series histograms can be used to train an anomaly detection model. Based on the predicted categories and the true categories of the training data, a cross-entropy loss is constructed, and the model parameters are optimized based on the gradient descent method until convergence, thereby completing the model training and obtaining the final anomaly detection model 208. In this way, by inputting the keywords and time series information (for example, time series histogram) of the target feedback information 302 into the anomaly detection model 208, the detection result of whether there is an abnormality can be obtained.

在一些实施例中，服务器200还可以在确定异常检测结果为是(Yes)时，向特定人员发送报警提示(例如，向后台监测人员的设备发送风险提示)，从而使得相关人员能够及时得知异常问题的发生，进而及时作出反应。In some embodiments, when the server 200 determines that the anomaly detection result is Yes, the server 200 can also send an alarm prompt to a specific person (for example, send a risk prompt to the device of the background monitoring person), so that the relevant personnel can be informed in time. abnormal problems occur, and respond in a timely manner.

从上述实施例可以看出，本公开实施例提出的异常检测系统100，能够从用户反馈数据中实时挖掘出可能正在发生的异常问题，算法容错性强，且实时性高。It can be seen from the above embodiments that the anomaly detection system 100 proposed in the embodiment of the present disclosure can dig out abnormal problems that may be occurring in real time from user feedback data. The algorithm has strong fault tolerance and high real-time performance.

本公开实施例还提供了一种异常检测方法。图3示出了本公开实施例所提供的实施例方法400的流程示意图。该方法400可以由图1的服务器200来实现，并可以进一步包括以下步骤，如图3所示。Embodiments of the present disclosure also provide an anomaly detection method. Figure 3 shows a schematic flowchart of an embodiment method 400 provided by an embodiment of the present disclosure. The method 400 may be implemented by the server 200 of FIG. 1 , and may further include the following steps, as shown in FIG. 3 .

在步骤402，服务器200可以接收目标反馈信息302。At step 402, the server 200 may receive target feedback information 302.

在步骤404，服务器200可以提取所述目标反馈信息的第一向量化表示。In step 404, the server 200 may extract a first vectorized representation of the target feedback information.

在一些实施例中，提取所述目标反馈信息的第一向量化表示，包括：将所述目标反馈信息输入文本表示模型(例如，图2A的模型2044)，输出所述第一向量化表示。其中，所述文本表示模型是基于历史反馈信息训练到的。In some embodiments, extracting the first vectorized representation of the target feedback information includes: inputting the target feedback information into a text representation model (eg, model 2044 in Figure 2A), and outputting the first vectorized representation. Wherein, the text representation model is trained based on historical feedback information.

在一些实施例中，所述文本表示模型的训练样本集包括多个训练样本，所述训练样本为带掩码标识符的历史反馈信息，且所述训练样本具有分类标签，所述分类标签为与所述历史反馈信息相关联的标签，使得模型可以学习到“相同标签的反馈信息在向量空间中距离较近，不同标签的反馈信息在向量空间中距离较远”这种语义信息。In some embodiments, the training sample set of the text representation model includes multiple training samples, the training samples are historical feedback information with masked identifiers, and the training samples have classification labels, and the classification labels are The tags associated with the historical feedback information enable the model to learn the semantic information that "feedback information with the same tag is close in the vector space, and feedback information with different tags is far apart in the vector space."

在一些实施例中，所述历史反馈信息包括多个词，所述词对应的掩码标识符的掩码概率基于所述词的关键词提取算法得分确定，能够增强模型对反馈文本关键信息的表示。In some embodiments, the historical feedback information includes multiple words, and the mask probability of the mask identifier corresponding to the word is determined based on the keyword extraction algorithm score of the word, which can enhance the model's understanding of the key information of the feedback text. express.

在步骤406，服务器200可以基于所述第一向量化表示和历史反馈信息(例如，图1的数据库服务器202中存储的反馈信息)中确定至少一个第二向量化表示，所述第二向量化表示与所述第一向量化表示的相似度大于预设相似度阈值。In step 406, the server 200 may determine at least one second vectorized representation based on the first vectorized representation and historical feedback information (eg, feedback information stored in the database server 202 of FIG. 1), the second vectorized representation Indicates that the similarity with the first vectorized representation is greater than the preset similarity threshold.

在一些实施例中，基于所述第一向量化表示和历史反馈信息确定至少一个第二向量化表示，包括：将所述第一向量化表示输入向量检索引擎，输出所述第二向量化表示。In some embodiments, determining at least one second vectorized representation based on the first vectorized representation and historical feedback information includes: inputting the first vectorized representation into a vector retrieval engine and outputting the second vectorized representation .

在步骤408，服务器200可以基于所述目标反馈信息以及所述至少一个第二向量化表示对应的反馈信息，得到所述目标反馈信息的时序信息。In step 408, the server 200 may obtain the timing information of the target feedback information based on the target feedback information and the feedback information corresponding to the at least one second vectorized representation.

在一些实施例中，所述时序信息是基于所述目标反馈信息和所述至少一个第二向量化表示对应的至少一个反馈信息而生成的时序直方图。In some embodiments, the timing information is a timing histogram generated based on the target feedback information and at least one feedback information corresponding to the at least one second vectorized representation.

在步骤410，服务器200可以提取所述目标反馈信息的关键词。In step 410, the server 200 may extract keywords of the target feedback information.

在步骤412，服务器200可以基于所述目标反馈信息的所述至少一个关键词和所述时序信息，确定异常检测结果。In step 412, the server 200 may determine an anomaly detection result based on the at least one keyword of the target feedback information and the timing information.

在一些实施例中，基于所述目标反馈信息的所述至少一个关键词和所述时序信息，确定异常检测结果，包括：将所述关键词和所述时序信息输入异常检测模型(例如，图2D的模型208)，输出所述异常检测结果；其中，所述异常检测模型是基于历史反馈信息训练到的。In some embodiments, determining an anomaly detection result based on the at least one keyword of the target feedback information and the timing information includes: inputting the keyword and the timing information into an anomaly detection model (for example, FIG. 2D model 208), outputs the anomaly detection result; wherein the anomaly detection model is trained based on historical feedback information.

在一些实施例中，所述异常检测模型的训练样本集包括多个训练样本，所述多个训练样本包括正例样本和负例样本，其中，所述正例样本为所述历史反馈信息中的异常反馈信息，所述负例样本为从所述历史反馈信息中随机选择的反馈信息。In some embodiments, the training sample set of the anomaly detection model includes multiple training samples, and the multiple training samples include positive samples and negative samples, wherein the positive samples are from the historical feedback information. abnormal feedback information, and the negative sample is feedback information randomly selected from the historical feedback information.

在一些实施例中，基于所述目标反馈信息的所述至少一个关键词和所述时序信息，确定异常检测结果，包括：从所述至少一个关键词中提取至少一个关键词特征；从所述时序信息中提取时序特征；将所述至少一个关键词特征和所述时序特征拼接为目标特征；以及基于所述目标特征进行分类预测，输出所述异常检测结果。In some embodiments, determining an anomaly detection result based on the at least one keyword of the target feedback information and the timing information includes: extracting at least one keyword feature from the at least one keyword; Extract temporal features from the temporal information; splice the at least one keyword feature and the temporal features into target features; perform classification prediction based on the target features, and output the anomaly detection results.

需要说明的是，本公开实施例的方法可以由单个设备执行，例如一台计算机或服务器等。本实施例的方法也可以应用于分布式场景下，由多台设备相互配合来完成。在这种分布式场景的情况下，这多台设备中的一台设备可以只执行本公开实施例的方法中的某一个或多个步骤，这多台设备相互之间会进行交互以完成所述的方法。It should be noted that the methods in the embodiments of the present disclosure can be executed by a single device, such as a computer or server. The method of this embodiment can also be applied in a distributed scenario, and is completed by multiple devices cooperating with each other. In this distributed scenario, one device among the multiple devices can only perform one or more steps in the method of the embodiment of the present disclosure, and the multiple devices will interact with each other to complete all the steps. method described.

需要说明的是，上述对本公开的一些实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下，在权利要求书中记载的动作或步骤可以按照不同于上述实施例中的顺序来执行并且仍然可以实现期望的结果。另外，在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中，多任务处理和并行处理也是可以的或者可能是有利的。It should be noted that some embodiments of the present disclosure have been described above. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the above-described embodiments and still achieve the desired results. Additionally, the processes depicted in the figures do not necessarily require the specific order shown, or sequential order, to achieve desirable results. Multitasking and parallel processing are also possible or may be advantageous in certain implementations.

本公开实施例还提供了一种计算机设备，用于实现上述的异常检测方法400。图4示出了本公开实施例所提供的示例性计算机设备500的硬件结构示意图。计算机设备500可以用于实现服务器200，也可以用于实现终端设备300。如图4所示，计算机设备500可以包括：处理器502、存储器504、网络模块506、外围接口508和总线510。其中，处理器502、存储器504、网络模块506和外围接口508通过总线510实现彼此之间在计算机设备500的内部的通信连接。An embodiment of the present disclosure also provides a computer device for implementing the above-mentioned anomaly detection method 400. FIG. 4 shows a schematic hardware structure diagram of an exemplary computer device 500 provided by an embodiment of the present disclosure. The computer device 500 can be used to implement the server 200 or the terminal device 300 . As shown in Figure 4, computer device 500 may include: processor 502, memory 504, network module 506, peripheral interface 508, and bus 510. Among them, the processor 502, the memory 504, the network module 506 and the peripheral interface 508 implement communication connections between each other within the computer device 500 through the bus 510.

处理器502可以是中央处理器(Central Processing Unit，CPU)、图像处理器、神经网络处理器(NPU)、微控制器(MCU)、可编程逻辑器件、数字信号处理器(DSP)、应用专用集成电路(Application Specific Integrated Circuit，ASIC)、或者一个或多个集成电路。处理器502可以用于执行与本公开描述的技术相关的功能。在一些实施例中，处理器502还可以包括集成为单一逻辑组件的多个处理器。例如，如图5所示，处理器502可以包括多个处理器502a、502b和502c。The processor 502 may be a central processing unit (CPU), an image processor, a neural network processor (NPU), a microcontroller (MCU), a programmable logic device, a digital signal processor (DSP), an application-specific Integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits. Processor 502 may be used to perform functions related to the techniques described in this disclosure. In some embodiments, processor 502 may also include multiple processors integrated into a single logical component. For example, as shown in Figure 5, processor 502 may include multiple processors 502a, 502b, and 502c.

存储器504可以配置为存储数据(例如，指令、计算机代码等)。如图5所示，存储器504存储的数据可以包括程序指令(例如，用于实现本公开实施例的异常检测方法的程序指令)以及要处理的数据(例如，存储器可以存储其他模块的配置文件等)。处理器502也可以访问存储器504存储的程序指令和数据，并且执行程序指令以对要处理的数据进行操作。存储器504可以包括易失性存储装置或非易失性存储装置。在一些实施例中，存储器504可以包括随机访问存储器(RAM)、只读存储器(ROM)、光盘、磁盘、硬盘、固态硬盘(SSD)、闪存、存储棒等。Memory 504 may be configured to store data (eg, instructions, computer code, etc.). As shown in Figure 5, the data stored in the memory 504 may include program instructions (for example, program instructions for implementing the anomaly detection method of the embodiment of the present disclosure) and data to be processed (for example, the memory may store configuration files of other modules, etc. ). Processor 502 may also access program instructions and data stored in memory 504 and execute the program instructions to operate on data to be processed. Memory 504 may include volatile storage or non-volatile storage. In some embodiments, memory 504 may include random access memory (RAM), read only memory (ROM), optical disks, magnetic disks, hard drives, solid state drives (SSD), flash memory, memory sticks, and the like.

网络接口506可以配置为经由网络向计算机设备500提供与其他外部设备的通信。该网络可以是能够传输和接收数据的任何有线或无线的网络。例如，该网络可以是有线网络、本地无线网络(例如，蓝牙、WiFi、近场通信(NFC)等)、蜂窝网络、因特网、或上述的组合。可以理解的是，网络的类型不限于上述具体示例。Network interface 506 may be configured to provide communication to computer device 500 with other external devices via a network. The network can be any wired or wireless network capable of transmitting and receiving data. For example, the network may be a wired network, a local wireless network (eg, Bluetooth, WiFi, Near Field Communication (NFC), etc.), a cellular network, the Internet, or a combination thereof. It is understood that the type of network is not limited to the specific examples above.

外围接口508可以配置为将计算机设备500与一个或多个外围装置连接，以实现信息输入及输出。例如，外围装置可以包括键盘、鼠标、触摸板、触摸屏、麦克风、各类传感器等输入设备以及显示器、扬声器、振动器、指示灯等输出设备。Peripheral interface 508 may be configured to connect computer device 500 with one or more peripheral devices to enable information input and output. For example, peripheral devices may include input devices such as keyboards, mice, touch pads, touch screens, microphones, and various sensors, as well as output devices such as displays, speakers, vibrators, and indicator lights.

总线510可以被配置为在计算机设备500的各个组件(例如处理器502、存储器504、网络接口506和外围接口508)之间传输信息，诸如内部总线(例如，处理器-存储器总线)、外部总线(USB端口、PCI-E总线)等。Bus 510 may be configured to transport information between various components of computer device 500 (eg, processor 502, memory 504, network interface 506, and peripheral interface 508), such as an internal bus (eg, processor-memory bus), an external bus (USB port, PCI-E bus), etc.

需要说明的是，尽管上述计算机设备500的架构仅示出了处理器502、存储器504、网络接口506、外围接口508和总线510，但是在具体实施过程中，该计算机设备500的架构还可以包括实现正常运行所必需的其他组件。此外，本领域的技术人员可以理解的是，上述计算机设备500的架构中也可以仅包含实现本公开实施例方案所必需的组件，而不必包含图中所示的全部组件。It should be noted that although the above architecture of the computer device 500 only shows the processor 502, the memory 504, the network interface 506, the peripheral interface 508 and the bus 510, during specific implementation, the architecture of the computer device 500 may also include Implement other components necessary for proper functioning. In addition, those skilled in the art can understand that the architecture of the computer device 500 may only include components necessary to implement the embodiments of the present disclosure, and does not necessarily include all components shown in the figures.

本公开实施例还提供了一种检测帧率的装置。图5示出了本公开实施例提供的示例性装置600的示意图。该装置600可以包括以下结构。An embodiment of the present disclosure also provides a device for detecting frame rate. FIG. 5 shows a schematic diagram of an exemplary device 600 provided by an embodiment of the present disclosure. The device 600 may include the following structures.

接收模块602，被配置为：接收目标反馈信息；The receiving module 602 is configured to: receive target feedback information;

向量提取模块604，被配置为：提取所述目标反馈信息的第一向量化表示；The vector extraction module 604 is configured to: extract the first vectorized representation of the target feedback information;

信息提取模块606，被配置为：基于所述第一向量化表示和历史反馈信息确定至少一个第二向量化表示，所述第二向量化表示与所述第一向量化表示的相似度大于预设相似度阈值；基于所述目标反馈信息以及所述至少一个第二向量化表示对应的反馈信息，得到所述目标反馈信息的时序信息；以及，提取所述目标反馈信息的至少一个关键词；以及The information extraction module 606 is configured to: determine at least one second vectorized representation based on the first vectorized representation and historical feedback information, and the similarity between the second vectorized representation and the first vectorized representation is greater than a predetermined degree. Set a similarity threshold; obtain the timing information of the target feedback information based on the target feedback information and the feedback information corresponding to the at least one second vectorized representation; and extract at least one keyword of the target feedback information; as well as

检测模块608，被配置为：基于所述目标反馈信息的所述至少一个关键词和所述时序信息，确定异常检测结果。The detection module 608 is configured to determine an anomaly detection result based on the at least one keyword of the target feedback information and the timing information.

在一些实施例中，向量提取模块604，被配置为：将所述目标反馈信息输入文本表示模型，输出所述第一向量化表示；其中，所述文本表示模型是基于历史反馈信息训练到的。In some embodiments, the vector extraction module 604 is configured to: input the target feedback information into a text representation model and output the first vectorized representation; wherein the text representation model is trained based on historical feedback information .

在一些实施例中，所述文本表示模型的训练样本集包括多个训练样本，所述训练样本为带掩码标识符的历史反馈信息，且所述训练样本具有分类标签，所述分类标签为与所述历史反馈信息相关联的标签。In some embodiments, the training sample set of the text representation model includes multiple training samples, the training samples are historical feedback information with masked identifiers, and the training samples have classification labels, and the classification labels are Tags associated with the historical feedback information.

在一些实施例中，所述历史反馈信息包括多个词，所述词对应的掩码标识符的掩码概率基于所述词的关键词提取算法得分确定。In some embodiments, the historical feedback information includes multiple words, and the masking probability of the mask identifier corresponding to the word is determined based on the keyword extraction algorithm score of the word.

在一些实施例中，信息提取模块606，被配置为：将所述第一向量化表示输入向量检索引擎，输出所述至少一个第二向量化表示。In some embodiments, the information extraction module 606 is configured to input the first vectorized representation into a vector retrieval engine and output the at least one second vectorized representation.

在一些实施例中，所述时序信息是基于所述目标反馈信息和所述至少一个第二向量化表示对应的反馈信息而生成的时序直方图。In some embodiments, the timing information is a timing histogram generated based on the target feedback information and feedback information corresponding to the at least one second vectorized representation.

在一些实施例中，检测模块608，被配置为：将所述至少一个关键词和所述时序信息输入异常检测模型，输出所述异常检测结果；其中，所述异常检测模型是基于历史反馈信息训练到的。In some embodiments, the detection module 608 is configured to: input the at least one keyword and the timing information into an anomaly detection model, and output the anomaly detection result; wherein the anomaly detection model is based on historical feedback information Trained.

在一些实施例中，检测模块608，被配置为：从所述至少一个关键词中提取至少一个关键词特征；从所述时序信息中提取时序特征；将所述至少一个关键词特征和所述时序特征拼接为目标特征；以及基于所述目标特征进行分类预测，输出所述异常检测结果。In some embodiments, the detection module 608 is configured to: extract at least one keyword feature from the at least one keyword; extract timing features from the timing information; combine the at least one keyword feature with the splicing time series features into target features; performing classification prediction based on the target features, and outputting the anomaly detection results.

为了描述的方便，描述以上装置时以功能分为各种模块分别描述。当然，在实施本公开时可以把各模块的功能在同一个或多个软件和/或硬件中实现。For the convenience of description, when describing the above device, the functions are divided into various modules and described separately. Of course, when implementing the present disclosure, the functions of each module can be implemented in the same or multiple software and/or hardware.

上述实施例的装置用于实现前述任一实施例中相应的方法400，并且具有相应的方法实施例的有益效果，在此不再赘述。The devices of the above embodiments are used to implement the corresponding method 400 in any of the foregoing embodiments, and have the beneficial effects of the corresponding method embodiments, which will not be described again here.

基于同一发明构思，与上述任意实施例方法相对应的，本公开还提供了一种非暂态计算机可读存储介质，所述非暂态计算机可读存储介质存储计算机指令，所述计算机指令用于使所述计算机执行如上任一实施例所述的方法400。Based on the same inventive concept, corresponding to any of the above embodiment methods, the present disclosure also provides a non-transitory computer-readable storage medium, the non-transitory computer-readable storage medium stores computer instructions, and the computer instructions use To enable the computer to execute the method 400 as described in any of the above embodiments.

本实施例的计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括，但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带，磁带式磁盘存储或其他磁性存储设备或任何其他非传输介质，可用于存储可以被计算设备访问的信息。The computer-readable media in this embodiment include permanent and non-permanent, removable and non-removable media, and information storage can be implemented by any method or technology. Information may be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), and read-only memory. (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device.

上述实施例的存储介质存储的计算机指令用于使所述计算机执行如上任一实施例所述的方法400，并且具有相应的方法实施例的有益效果，在此不再赘述。The computer instructions stored in the storage media of the above embodiments are used to cause the computer to execute the method 400 as described in any of the above embodiments, and have the beneficial effects of the corresponding method embodiments, which will not be described again here.

基于同一发明构思，与上述任意实施例方法400相对应的，本公开还提供了一种计算机程序产品，其包括计算机程序。在一些实施例中，所述计算机程序由一个或多个处理器可执行以使得所述处理器执行所述的方法300。对应于方法400各实施例中各步骤对应的执行主体，执行相应步骤的处理器可以是属于相应执行主体的。Based on the same inventive concept, corresponding to the method 400 in any of the above embodiments, the present disclosure also provides a computer program product, which includes a computer program. In some embodiments, the computer program is executable by one or more processors such that the processors perform the method 300 . Corresponding to the execution subject corresponding to each step in each embodiment of method 400, the processor that executes the corresponding step may belong to the corresponding execution subject.

上述实施例的计算机程序产品用于使处理器执行如上任一实施例所述的方法400，并且具有相应的方法实施例的有益效果，在此不再赘述。The computer program product of the above embodiments is used to cause the processor to execute the method 400 as described in any of the above embodiments, and has the beneficial effects of the corresponding method embodiments, which will not be described again here.

所属领域的普通技术人员应当理解：以上任何实施例的讨论仅为示例性的，并非旨在暗示本公开的范围(包括权利要求)被限于这些例子；在本公开的思路下，以上实施例或者不同实施例中的技术特征之间也可以进行组合，步骤可以以任意顺序实现，并存在如上所述的本公开实施例的不同方面的许多其它变化，为了简明它们没有在细节中提供。Those of ordinary skill in the art should understand that the discussion of any above embodiments is only illustrative, and is not intended to imply that the scope of the present disclosure (including the claims) is limited to these examples; under the spirit of the present disclosure, the above embodiments or Technical features in different embodiments may also be combined, steps may be implemented in any order, and there are many other variations of different aspects of the disclosed embodiments as described above, which are not provided in detail for the sake of brevity.

另外，为简化说明和讨论，并且为了不会使本公开实施例难以理解，在所提供的附图中可以示出或可以不示出与集成电路(IC)芯片和其它部件的公知的电源/接地连接。此外，可以以框图的形式示出装置，以便避免使本公开实施例难以理解，并且这也考虑了以下事实，即关于这些框图装置的实施方式的细节是高度取决于将要实施本公开实施例的平台的(即，这些细节应当完全处于本领域技术人员的理解范围内)。在阐述了具体细节(例如，电路)以描述本公开的示例性实施例的情况下，对本领域技术人员来说显而易见的是，可以在没有这些具体细节的情况下或者这些具体细节有变化的情况下实施本公开实施例。因此，这些描述应被认为是说明性的而不是限制性的。Additionally, to simplify illustration and discussion, and so as not to obscure embodiments of the present disclosure, well-known power supplies/components with integrated circuit (IC) chips and other components may or may not be shown in the provided figures. Ground connection. Furthermore, devices may be shown in block diagram form in order to avoid obscuring the embodiments of the present disclosure, and this also takes into account the fact that details regarding the implementation of these block diagram devices are highly dependent on the implementation of the disclosed embodiments. platform (i.e., these details should be well within the understanding of those skilled in the art). Where specific details (eg, circuits) are set forth to describe exemplary embodiments of the present disclosure, it will be apparent to those skilled in the art that systems may be constructed without these specific details or with changes in these specific details. The embodiments of the present disclosure are implemented below. Accordingly, these descriptions should be considered illustrative rather than restrictive.

尽管已经结合了本公开的具体实施例对本公开进行了描述，但是根据前面的描述，这些实施例的很多替换、修改和变型对本领域普通技术人员来说将是显而易见的。例如，其它存储器架构(例如，动态RAM(DRAM))可以使用所讨论的实施例。Although the present disclosure has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art from the foregoing description. For example, other memory architectures such as dynamic RAM (DRAM) may use the discussed embodiments.

本公开实施例旨在涵盖落入所附权利要求的宽泛范围之内的所有这样的替换、修改和变型。因此，凡在本公开实施例的精神和原则之内，所做的任何省略、修改、等同替换、改进等，均应包含在本公开的保护范围之内。The disclosed embodiments are intended to embrace all such alternatives, modifications and variations that fall within the broad scope of the appended claims. Therefore, any omissions, modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the embodiments of the present disclosure shall be included in the protection scope of the present disclosure.

Claims

1. An anomaly detection method, comprising:

receiving target feedback information;

extracting a first vectorized representation of the target feedback information;

determining at least one second vectorized representation based on the first vectorized representation and historical feedback information, the second vectorized representation having a similarity to the first vectorized representation greater than a preset similarity threshold;

determining timing information of the target feedback information based on the target feedback information and feedback information corresponding to the at least one second vectorized representation;

extracting at least one keyword of the target feedback information; and

and determining an abnormality detection result based on the at least one keyword of the target feedback information and the timing information.

2. The method of claim 1, wherein extracting the first vectorized representation of the target feedback information comprises:

Inputting the target feedback information into a text representation model, and outputting the first vectorized representation;

wherein the text representation model is trained based on historical feedback information.

3. The method of claim 2, wherein the training sample set of the text representation model includes a plurality of training samples, the training samples being historical feedback information with a mask identifier, and the training samples having classification tags, the classification tags being tags associated with the historical feedback information.

4. The method of claim 3, wherein the historical feedback information includes a plurality of words, the mask probabilities of the mask identifiers corresponding to the words being determined based on keyword extraction algorithm scores of the words.

5. The method of claim 1, wherein determining at least one second vectorized representation based on the first vectorized representation and historical feedback information comprises:

the first quantized representation is input to a vector retrieval engine and the at least one second quantized representation is output.

6. The method of claim 1, wherein the timing information is a timing histogram generated based on the target feedback information and the at least one second quantized representation corresponding feedback information.

7. The method of claim 1, wherein determining an anomaly detection result based on the at least one keyword of the target feedback information and the timing information comprises:

inputting the at least one keyword and the time sequence information into an abnormality detection model, and outputting an abnormality detection result;

wherein the anomaly detection model is trained based on historical feedback information.

8. The method of claim 7, wherein the training sample set of the anomaly detection model comprises a plurality of training samples including a positive example sample and a negative example sample, wherein the positive example sample is anomaly feedback information in the historical feedback information and the negative example sample is feedback information randomly selected from the historical feedback information.

9. The method of claim 1, 7 or 8, wherein determining an anomaly detection result based on the at least one keyword of the target feedback information and the timing information comprises:

extracting at least one keyword feature from the at least one keyword;

extracting time sequence characteristics from the time sequence information;

splicing the at least one keyword feature and the time sequence feature into a target feature; and

And carrying out classification prediction based on the target characteristics, and outputting the abnormality detection result.

10. An abnormality detection apparatus comprising:

a receiving module configured to: receiving target feedback information;

a vector extraction module configured to: extracting a first vectorized representation of the target feedback information;

an information extraction module configured to: determining at least one second vectorized representation based on the first vectorized representation and historical feedback information, the second vectorized representation having a similarity to the first vectorized representation greater than a preset similarity threshold; obtaining time sequence information of the target feedback information based on the target feedback information and the feedback information corresponding to the at least one second vector representation; extracting at least one keyword of the target feedback information; and

a detection module configured to: and determining an abnormality detection result based on the at least one keyword of the target feedback information and the timing information.

11. A computer device comprising one or more processors, memory; and one or more programs, wherein the one or more programs are stored in the memory and executed by the one or more processors, the programs comprising instructions for performing the method of any of claims 1-9.

12. A non-transitory computer readable storage medium containing a computer program which, when executed by one or more processors, causes the processors to perform the method of any of claims 1-9.

13. A computer program product comprising computer program instructions which, when run on a computer, cause the computer to perform the method of any of claims 1-9.