WO2023206702A1 - 日志的处理方法和装置、存储介质及电子装置 - Google Patents
日志的处理方法和装置、存储介质及电子装置 Download PDFInfo
- Publication number
- WO2023206702A1 WO2023206702A1 PCT/CN2022/096434 CN2022096434W WO2023206702A1 WO 2023206702 A1 WO2023206702 A1 WO 2023206702A1 CN 2022096434 W CN2022096434 W CN 2022096434W WO 2023206702 A1 WO2023206702 A1 WO 2023206702A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- target
- log
- logs
- call
- cluster
- Prior art date
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 31
- 239000013598 vector Substances 0.000 claims abstract description 93
- 238000004458 analytical method Methods 0.000 claims abstract description 56
- 230000003993 interaction Effects 0.000 claims description 56
- 230000006870 function Effects 0.000 claims description 53
- 238000012545 processing Methods 0.000 claims description 34
- 238000000034 method Methods 0.000 claims description 31
- 230000002159 abnormal effect Effects 0.000 claims description 13
- 238000012549 training Methods 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 8
- 238000001914 filtration Methods 0.000 claims description 8
- 238000004422 calculation algorithm Methods 0.000 description 16
- 238000004891 communication Methods 0.000 description 9
- 238000005516 engineering process Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 7
- 230000002452 interceptive effect Effects 0.000 description 6
- 238000010835 comparative analysis Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 238000012216 screening Methods 0.000 description 5
- 238000012360 testing method Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000013480 data collection Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000000717 retained effect Effects 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 238000005406 washing Methods 0.000 description 2
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 2
- 238000004887 air purification Methods 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000007621 cluster analysis Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000001035 drying Methods 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000012106 screening analysis Methods 0.000 description 1
- 239000004984 smart glass Substances 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000010408 sweeping Methods 0.000 description 1
- 230000007474 system interaction Effects 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3466—Performance evaluation by tracing or monitoring
- G06F11/3476—Data logging
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3065—Monitoring arrangements determined by the means or processing involved in reporting the monitored data
- G06F11/3072—Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting
- G06F11/3082—Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting the data filtering being achieved by aggregating or compressing the monitored data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/16—File or folder operations, e.g. details of user interfaces specifically adapted to file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- the present disclosure relates to the field of smart homes, and specifically, to a log processing method and device, a storage medium and an electronic device.
- the above-mentioned system can be an interactive system equipped with intelligent question and answer (for example, in-vehicle question and answer assistance system, smart audio, mobile phone voice assistant, etc.), that is, intelligent question and answer system.
- intelligent question and answer for example, in-vehicle question and answer assistance system, smart audio, mobile phone voice assistant, etc.
- the voice chat data i.e., logs
- the voice chat data can be screened.
- analysts can also perform reverse testing by observing control statements in the voice chat data in the background. Found system errors.
- the above log processing method requires analysts to observe the logs one by one to analyze whether there are errors and analyze the causes of the errors.
- the above method requires a lot of manpower and takes a long time. It can be seen from the above that the log processing method in the related art has the problem of low log processing efficiency due to the need to analyze the logs one by one.
- Embodiments of the present disclosure provide a log processing method and device, a storage medium, and an electronic device to at least solve the problem of low log processing efficiency due to the need to analyze logs one by one in log processing methods in related technologies.
- a log processing method including: obtaining multiple target logs to be processed, wherein each target log is used to record a call to a function through a control statement; obtaining all Describe the sentence vectors of the control statements recorded in each target log, and obtain the sentence vector corresponding to each target log; cluster the multiple target logs based on the sentence vector corresponding to each target log, Obtain multiple target clusters, where each target cluster corresponds to a function respectively; perform analysis operations on the target logs in each target cluster respectively, and obtain analysis results corresponding to each target cluster.
- a log processing device including: a first acquisition unit configured to acquire multiple target logs to be processed, wherein each target log is used to record a control The statement calls a function; the second acquisition unit is configured to obtain the sentence vector of the control statement recorded in each target log, and obtain the sentence vector corresponding to each target log; the clustering unit is configured to obtain the sentence vector based on The sentence vector corresponding to each target log clusters the multiple target logs to obtain multiple target clusters, where each target cluster corresponds to a function respectively; the execution unit is configured to respectively Perform analysis operations on the target logs in each target cluster to obtain analysis results corresponding to each target cluster.
- a computer-readable storage medium is also provided, and a computer program is stored in the computer-readable storage medium, wherein the computer program is configured to execute the above log processing when running. method.
- an electronic device including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the above-mentioned steps through the computer program. How to process logs.
- each target log is used to record a call to a function through a control statement; and obtaining the sentence vector of the control statement recorded in each target log, Obtain the sentence vector corresponding to each target log; cluster multiple target logs based on the sentence vector corresponding to each target log, and obtain multiple target clusters, where each target cluster corresponds to a function; Perform analysis operations on the target logs in each target cluster separately to obtain analysis results corresponding to each target cluster. Since multiple target logs are clustered based on the sentence vector corresponding to each target log, multiple target logs are obtained.
- Target clusters and perform analysis operations on the target logs in each target cluster, and then obtain the analysis results corresponding to each target cluster, which can reduce the amount of log processing and improve the efficiency of log processing.
- the technical effect is achieved, thereby solving the problem of low log processing efficiency caused by the need to analyze the logs one by one in the log processing methods in related technologies.
- Figure 1 is a schematic diagram of the hardware environment of an optional log processing method according to an embodiment of the present disclosure
- Figure 2 is a schematic flowchart of an optional log processing method according to an embodiment of the present disclosure
- FIG. 3 is a schematic flowchart of another optional log processing method according to an embodiment of the present disclosure.
- Figure 4 is a structural block diagram of an optional log processing device according to an embodiment of the present disclosure.
- FIG. 5 is a structural block diagram of an optional electronic device according to an embodiment of the present disclosure.
- a log processing method is provided.
- the log processing method is widely used in whole-house intelligent digital control application scenarios such as smart home, smart home, smart home device ecology, and smart residence (Intelligence House) ecology.
- the above log processing method can be applied to the hardware environment composed of the terminal device 102 and the server 104 as shown in FIG. 1 .
- the server 104 is connected to the terminal device 102 through the network and can be used to provide services (such as application services, etc.) for the terminal or the client installed on the terminal.
- a database can be set up on the server or independently from the server.
- cloud computing and/or edge computing services can be configured on the server or independently of the server to provide data computing services for the server 104.
- the above-mentioned network may include but is not limited to at least one of the following: wired network, wireless network.
- the above-mentioned wired network may include but is not limited to at least one of the following: wide area network, metropolitan area network, and local area network.
- the above-mentioned wireless network may include at least one of the following: WIFI (Wireless Fidelity, Wireless Fidelity), Bluetooth.
- the terminal device 102 may be, but is not limited to, a PC, a mobile phone, a tablet, a smart air conditioner, a smart hood, a smart refrigerator, a smart oven, a smart stove, a smart washing machine, a smart water heater, a smart washing equipment, a smart dishwasher, or a smart projection device.
- smart TV smart clothes drying rack, smart curtains, smart audio and video, smart sockets, smart audio, smart speakers, smart fresh air equipment, smart kitchen and bathroom equipment, smart bathroom equipment, smart sweeping robot, smart window cleaning robot, smart mopping robot, Smart air purification equipment, smart steamers, smart microwave ovens, smart kitchen appliances, smart purifiers, smart water dispensers, smart door locks, etc.
- the log processing method in the embodiment of the present disclosure may be executed by the server 104, may be executed by the terminal 102, or may be executed jointly by the server 104 and the terminal 102.
- the terminal 102 may also perform the log processing method according to the embodiment of the present disclosure by a client installed thereon.
- Figure 2 is a schematic flowchart of an optional log processing method according to an embodiment of the present disclosure. As shown in Figure 2, the flow of the method The following steps can be included:
- Step S202 Obtain multiple target logs to be processed, where each target log is used to record a call to a function through a control statement.
- the log processing method in this embodiment can be applied to the scenario of processing logs in terminal devices.
- the terminal devices may be devices carrying an intelligent question and answer system, or they may be intelligent hardware products containing other functional modules.
- the above-mentioned intelligent question and answer system can be a question and answer assistance system, a smart speaker, a mobile phone voice assistant, etc., and is not limited here.
- the above logs may be call logs related to voice interaction or call logs related to other function calls, which are not limited here.
- the terminal device is a device carrying an intelligent question and answer system
- the call log is an interaction log related to voice interaction.
- the intelligent question and answer system generates tens of thousands of interaction logs every day. In order to reversely improve the stability and interaction accuracy of the intelligent question and answer system, these background logs can be observed and analyzed, and the generalization of the question and answer system can be gradually improved based on user interaction data. degree. In related technologies, log analysts can generally observe and analyze whether interaction errors occur and analyze the causes of errors one by one according to different fields of intent. However, due to the large amount of interaction, this method is very labor-intensive.
- the target skill system can contain multiple control statements corresponding to the call log.
- Each target log can be used to record calls to a function via a control statement.
- the function called above can be a music on-demand skill, a home appliance control skill, or a skill that implements other functions.
- the corresponding control statements can be, "I want to listen to xx", "Play a song xx", etc.; for home appliance control skills, the corresponding control statements can be, "Turn on the air conditioner” ", "Turn on the air conditioner", etc.
- the above method of obtaining multiple target logs to be processed can be: reading the call logs in the background log data set for a period of time to obtain a set of target logs.
- a set of target logs can include call logs that call the same function, or it can also include Call logs for calling different functions.
- the control statements used to call the same function can be the same or different.
- Step S204 Obtain the sentence vector of the control statement recorded in each target log, and obtain the sentence vector corresponding to each target log.
- the sentence vectors of the control statements recorded in each target log can be extracted, thereby obtaining the sentence vector corresponding to each target log.
- the sentence vector corresponding to each target log can be generated based on the words contained in the control statement, or the BERT model (Bidirectional Encoder Representation from Transformers, a pre-trained language representation model) can be used to generate the records recorded in the target log.
- the sentence vector corresponding to the control statement that is, the sentence vector of the text corresponding to the control statement is generated by BERT.
- the generated sentence vector can accurately represent the semantics of the control statement, and can also accurately represent the information of the control statement.
- the similarity between different control statements Degree can be represented by the distance between corresponding sentence vectors.
- Step S206 Cluster multiple target logs based on sentence vectors corresponding to each target log to obtain multiple target clusters, where each target cluster corresponds to one function.
- multiple target logs can be clustered based on the sentence vector corresponding to each target log to obtain multiple target clusters, each target cluster Corresponds to one function respectively. That is, the target logs contained in each target class cluster can be call logs that call the same function, or most of them are call logs that call the same function.
- clustering algorithms used to cluster multiple target logs, which can be a clustering algorithm that specifies the number of clusters, or a clustering algorithm that does not specify clustering data.
- DBSCAN Density-Based Spatial Clustering of Applications with Noise
- DBSCAN is a typical Density clustering algorithm, which works well in clustering this article, can cluster similar texts into the same cluster. This algorithm divides regions with sufficient density into clusters and discovers clusters of arbitrary shapes in a noisy spatial database. It defines a cluster as the largest set of density-connected points.
- chats such as "I want to listen to xx" and "Play a xx” will be clustered into a cluster; while for home appliance control functions, high-frequency chats such as “Turn on the air conditioner” will be clustered. , “Turn on the air conditioner”, etc. These chats that also belong to the control category will also be clustered into one cluster.
- the multiple clusters obtained by clustering multiple target logs can all be target clusters, or some of them can be target clusters.
- the target cluster is determined from the multiple clusters. It may be executed according to the received selection instruction, or it may be executed according to the preset cluster filtering conditions, which is not limited in this embodiment.
- Step S208 Perform analysis operations on the target logs in each target cluster to obtain analysis results corresponding to each target cluster.
- the target logs in each target cluster can be processed and analyzed.
- log analysts can conduct comparative analysis based on the error logs in each cluster, screen the causes of errors, and provide support for the generalization work of the Q&A system from the perspective of actual log data.
- the analysis operation performed on the target log in each target cluster may be performed on the obtained target log in each clustered target cluster, and the target log may contain the control statement.
- Parsing results and parsing encoding The parsing results can be identified as parsing results, which can be represented by T or F. T means correct parsing, F means parsing error. Parsing encoding can be used to indicate the cause of control statement parsing errors.
- the target clusters are music on demand and home appliance control.
- the control statement contained in the log in the music on-demand cluster is "I want to listen to singer B's song 1”
- the corresponding parsing result is F
- the parsing code is 0001
- the corresponding parsing result of the log is, parsing exception
- the reason for the exception is the lack of music resources of singer B.
- each target log is used to record a call to a function through a control statement; the sentence vector of the control statement recorded in each target log is obtained, Obtain the sentence vector corresponding to each target log; cluster multiple target logs based on the sentence vector corresponding to each target log, and obtain multiple target clusters, where each target cluster corresponds to a function; Perform analysis operations on the target logs in each target cluster separately to obtain analysis results corresponding to each target cluster, which solves the problem of log processing efficiency in log processing methods in related technologies due to the need to analyze logs one by one. This reduces the problem and improves the efficiency of log processing.
- the sentence vector of the control statement recorded in each target log is obtained, and the sentence vector corresponding to each target log is obtained, including:
- the control statement recorded in each target log can be input into the pre-training model, thereby obtaining the sentence vector corresponding to each target log.
- the pre-trained model here can be a BERT model or other pre-trained models.
- the control statements recorded in each target log can be input into the BERT model.
- the BERT model can determine the control statements based on the input control statements.
- the corresponding word vector Calculate the average of all word vectors contained in the control statement to determine the sentence vector corresponding to the control statement. Repeat the operation to determine the sentence vector corresponding to each target log.
- the BERT model can accurately extract sentence vectors, which represent the meaning of text sentences at the semantic level.
- the pre-training model is used to extract the sentence vectors of the control statements, which can improve the accuracy of sentence vector generation.
- multiple target logs are clustered based on the sentence vector corresponding to each target log, and multiple target clusters are obtained, including:
- the deduplicated log text can be clustered based on the clustering algorithm model of sentence vectors, and similar expressions with the same function can be clustered into The same cluster, and then perform anomaly analysis on parsing and encoding in different clusters.
- the text is first clustered, and then the text is analyzed for different clusters.
- Clusters perform comparative analysis of parsing and coding, which can quickly locate error logs from different clusters, analyze errors (bugs) in the system, and reduce the manpower and time costs of log analysts.
- density clustering can be performed on multiple target logs based on the sentence vector corresponding to each target log, thereby obtaining multiple candidate clusters. Then all candidate clusters whose target logs contained in multiple candidate clusters call the same function are determined as target clusters.
- the sentence vectors output based on the BERT model can be input into the DBSCAN density clustering algorithm model to divide the background log text into several clusters.
- the clustering effect is better for some functional types of logs, such as the two functional categories of music resource on-demand and home appliance control.
- multiple target logs can be density clustered based on the sentence vector corresponding to each target log to obtain multiple candidate clusters.
- the clusters corresponding to the preset functions among the multiple candidate clusters can be determined as multiple target clusters. For example, all candidate clusters in the two functional categories of music resource on-demand and home appliance control can be determined as target clusters.
- the function called by the target log in each candidate cluster can be determined, and then all candidate clusters that call the same function are determined as target clusters.
- Candidate clusters can be used as current candidate clusters to perform the following steps to obtain a set of target skills: determine a set of functions called by the target log in the current candidate cluster; add the set of functions to the corresponding The skill with the number of target logs is determined as a candidate skill; when the proportion of the number of target logs corresponding to the candidate class cluster to the total number of target logs in the current candidate class cluster reaches the target proportion threshold, the current candidate class will be The cluster is determined as the target class cluster.
- the density clustering algorithm is applied to the log processing process.
- the density clustering algorithm based on sentence vectors can cluster texts with similar sentence patterns into the same cluster, such as music resources on demand, home appliance control, etc. Yucha can be clustered into a cluster to a large extent. Then, log analysis is performed on these two skill-type Yucha clusters by parsing and encoding verification, which can quickly locate the cause of error logs and help log analysts quickly find and locate problem, greatly reducing labor costs.
- multiple target logs to be processed are obtained, including:
- the call logs in the log data set are generally processed using a separate filtering method. This method often cannot cover all Yucha interaction logs. It is found that the generalization ability of Yucha is very limited. In the face of Massive interactive data is only screened one by one manually, which often wastes a lot of labor costs and has little generalization effect on the question and answer system.
- other methods can be used to perform deduplication operations on the call logs in the log data set, which are not limited here.
- a deduplication operation is performed on the call logs in the log data set to obtain multiple target logs, including:
- each call log is the current call log:
- S42 Perform matching operations on the current call log and each target log in the multiple target logs in sequence until the matching stop condition is met, where the matching stop condition includes: all target logs in the multiple target logs have been matched, OK There are target logs matching the current call log in multiple target logs:
- a matching operation can be performed on the call log to be processed and each target call log in the multiple target call logs that have been determined to determine whether the call log to be processed will be Processed call logs are added to multiple target call logs until a matching stopping condition is met.
- the matching stop conditions here may include: all target logs in the multiple target logs have been matched, and it is determined that there is a target log in the multiple target logs that matches the current call log.
- the interaction result of the interaction statement recorded in the current call log is the interaction result of the interaction statement recorded in the current target log.
- a deduplication operation is performed on the call logs in the log data set to obtain multiple target logs, including:
- the strong text matching deduplication operation is a deduplication operation performed on the call logs with the same control statements recorded in the log data set. .
- a strong text matching deduplication operation can be performed on the call logs in the log data set to obtain multiple target logs.
- the strong text matching deduplication operation is performed on the call logs with the same control statements recorded in the log data set. deduplication operation.
- log analysts can call the logs for all voice chats and count the number of interactions of each text using strong text matching. For example, the voice chat "turn on the air conditioner" has 10,000 interactions. In this way, log analysts can target the Text chat no longer needs to be screened 10,000 times. It only needs to be filtered and verified once to check whether the question and answer system supports the parsing of the chat.
- analysis operations are performed on the target logs in each target cluster to obtain analysis results corresponding to each target cluster, including:
- abnormal logs whose recorded interaction results are interaction exceptions can be filtered out from each target cluster.
- the parsing result is F Target logs, and analyze the exception logs of interaction exceptions to obtain the analysis results of the target cluster. For example, analyze the parsing and coding of the exception logs to determine the cause of the exception, and perform statistical analysis on the cause of the exception to obtain the target Cluster analysis results.
- coding 0000 indicates that the parsing is correct
- coding 0001 indicates that the music resource cannot be called
- coding 0002 Indicates that the device status is abnormal. If the control statement contained in the log in the music on-demand cluster is "I want to listen to singer B's song 1", the corresponding parsing result is F, and the parsing code is 0001, then the corresponding parsing result of the log is, parsing exception, The reason for the exception is the lack of music resources of singer B.
- the call log is a voice interaction log
- the processing is to screen interaction anomalies in the voice interaction in the voice interaction log.
- testers In addition to using conventional black-and-white box testing methods to forwardly test the system's chat generalization degree, testers usually designate personnel to observe some high-frequency interactive statements in the background to reversely discover erroneous interactions of the system. To reversely troubleshoot system bugs and screen for errors in insufficient generalization of Yucha. However, it is often unrealistic and impossible to screen tens of thousands of interaction logs manually through manual labor. However, the interaction logs are first deduplicated through text strong matching, and then manually filtered. Since the voice chat data after strong matching is still large, it still has the disadvantage of consuming too much labor cost and not having a significant effect.
- This optional example provides a solution for generalizing Q&A system chat through log screening based on sentence vector clustering.
- the sentence vector clustering algorithm is applied to log analysis: first, the massive amount of actual interaction data in the background is analyzed. Clustering is performed and divided into different clusters; then, analysis and coding screening analysis is performed on the music on-demand cluster and the home appliance control cluster, which can quickly locate the cause of the parsing error to a certain extent and empower the generalization work of the Q&A system's chat. .
- the flow of the log processing method in this optional example may include the following steps:
- Step S302 Perform strong text matching and deduplication on the background log data set.
- the background log text chat data set is deduplicated through strong text matching to obtain a set of target logs to be processed.
- Step S304 the BERT model outputs sentence vectors.
- Input a set of target logs after deduplication into the BERT model to obtain the sentence vector corresponding to the interactive sentence text in each target log.
- Step S306 DBSCAN text clustering.
- the clustering algorithm based on sentence vectors can cluster texts with similar sentence patterns into the same cluster. For example, music resource on-demand, home appliance control and other language chats can be clustered to a large extent. a cluster.
- the obtained sentence vectors are input into the DBSCAN density clustering algorithm to perform text clustering and distinguish the music on-demand language chat and the home appliance control language chat to form two different clusters.
- Step S308, perform analysis operation.
- chat clusters in the different clusters obtained, for example, the two skill chat clusters of music resource on-demand and home appliance control
- log analysis is performed on the parsing coding verification: observe whether the corresponding parsing is correctly marked ( T or F), and their corresponding parsing codes.
- code 0000 means that the parsing is correct
- code 0001 means that the music resources cannot be called
- code 0002 means that the device status is abnormal
- the clustering algorithm based on sentence vectors is applied to text log analysis to assist log analysts in background log screening, so that they can quickly locate error logs, discover system interaction errors, and quickly locate the source of error logs.
- the reason is that it helps log analysts quickly discover positioning problems, greatly reduces labor costs, and solves the problem that traditional methods consume too much labor costs and have insignificant results.
- FIG. 4 is a structural block diagram of an optional log processing device according to an embodiment of the present disclosure. As shown in Figure 4, the device may include:
- the first acquisition unit 402 is configured to acquire multiple target logs to be processed, where each target log is used to record a call to a function through a control statement;
- the second acquisition unit 404 is connected to the first acquisition unit 402 and is configured to acquire the sentence vector of the control statement recorded in each target log, and obtain the sentence vector corresponding to each target log;
- the clustering unit 406 is connected to the second acquisition unit 404 and is configured to cluster multiple target logs based on the sentence vector corresponding to each target log to obtain multiple target clusters, wherein each target cluster is associated with a The functions correspond respectively;
- the execution unit 408 is connected to the clustering unit 406 and is configured to perform analysis operations on the target logs in each target cluster to obtain analysis results corresponding to each target cluster.
- first acquisition unit 402 in this embodiment can be configured to perform the above step S202
- second acquisition unit 404 in this embodiment can be configured to perform the above step S204
- the clustering unit 406 in this embodiment The execution unit 408 in this embodiment may be configured to execute the above step S206, and the execution unit 408 in this embodiment may be configured to execute the above step S208.
- each target log is used to record the call of a function through a control statement; the sentence vector of the control statement recorded in each target log is obtained, and the sentence vector of each control statement is obtained.
- the sentence vector corresponding to the target log; multiple target logs are clustered based on the sentence vector corresponding to each target log, and multiple target clusters are obtained, in which each target cluster corresponds to a function; each target cluster is
- the target logs in the target cluster perform analysis operations to obtain analysis results corresponding to each target cluster, which solves the problem of low log processing efficiency due to the need to analyze logs one by one in the log processing methods in related technologies. Improved the efficiency of log processing.
- the second acquisition unit includes:
- the input module is configured to input the control statements recorded in each target log into the pre-training model to obtain the sentence vector corresponding to each target log.
- the pre-training model is used to determine the words contained in the input control statements according to the input module. The average of the vectors determines the sentence vector corresponding to each target log.
- the clustering unit includes:
- the clustering module is configured to perform density clustering on multiple target logs based on the sentence vector corresponding to each target log, and obtain multiple candidate clusters;
- the determination module is configured to determine all candidate clusters whose target logs call the same function among multiple candidate clusters as target clusters.
- the first acquisition unit includes:
- the execution module is configured to perform a deduplication operation on the call logs in the log data set to obtain multiple target logs, where each call log is used to record a call to a function through a control statement.
- execution modules include:
- the first execution sub-module is set to perform the following steps on each call log separately to obtain multiple target logs.
- each call log is the current call log:
- execution modules include:
- the second execution submodule is set to perform a strong text matching and deduplication operation on the call logs in the log data set to obtain multiple target logs, where the strong text matching and deduplication operation is the same call to the control statements recorded in the log data set. Deduplication operation performed on the log.
- execution units include:
- the filtering module is set to filter out the abnormal logs whose recorded interaction results are abnormal interactions from each target cluster based on the interaction results recorded in the target logs in each target cluster, and obtain the analysis results of each target cluster. , where the analysis results of each target cluster include the exception logs in each target cluster.
- the above module as part of the device, can run in the hardware environment as shown in Figure 1, and can be implemented by software or hardware, where the hardware environment includes a network environment.
- a storage medium is also provided.
- the above-mentioned storage medium can be used to execute the program coding of any of the above-mentioned log processing methods in the embodiment of the present disclosure.
- the above storage medium may be located on at least one network device among multiple network devices in the network shown in the above embodiment.
- the storage medium is configured to store program codes for performing the following steps:
- the above-mentioned storage medium may include but is not limited to: U disk, ROM, RAM, mobile hard disk, magnetic disk or optical disk and other various media that can store program codes.
- an electronic device for implementing the above log processing method is also provided.
- the electronic device may be a server, a terminal, or a combination thereof.
- Figure 5 is a structural block diagram of an optional electronic device according to an embodiment of the present disclosure. As shown in Figure 5, it includes a processor 502, a communication interface 504, a memory 506 and a communication bus 508. The processor 502, the communication interface 504 and memory 506 complete communication with each other through communication bus 508, where,
- Memory 506 configured to store computer programs
- processor 502 When the processor 502 is configured to execute the computer program stored on the memory 506, it implements the following steps:
- the communication bus may be a PCI (Peripheral Component Interconnect, Peripheral Component Interconnect Standard) bus, or an EISA (Extended Industry Standard Architecture, Extended Industrial Standard Architecture) bus, etc.
- the communication bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, only one thick line is used in Figure 5, but it does not mean that there is only one bus or one type of bus.
- the communication interface is used for communication between the above-mentioned electronic device and other equipment.
- the memory may include RAM or non-volatile memory, such as at least one disk memory.
- the memory may also be at least one storage device located remotely from the aforementioned processor.
- the memory 506 may include, but is not limited to, the first acquisition unit 402, the second acquisition unit 404, the clustering unit 406, and the execution unit 408 in the log processing device. In addition, it may also include but is not limited to other module units in the above log processing device, which will not be described again in this example.
- the above-mentioned processor can be a general-purpose processor, which can include but is not limited to: CPU (Central Processing Unit, central processing unit), NP (Network Processor, network processor), etc.; it can also be a DSP (Digital Signal Processing, digital signal processor) ), ASIC (Application Specific Integrated Circuit), FPGA (Field-Programmable Gate Array, field programmable gate array) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
- CPU Central Processing Unit, central processing unit
- NP Network Processor, network processor
- DSP Digital Signal Processing, digital signal processor
- ASIC Application Specific Integrated Circuit
- FPGA Field-Programmable Gate Array, field programmable gate array
- other programmable logic devices discrete gate or transistor logic devices, discrete hardware components.
- the device that implements the above log processing method can be a terminal device, and the terminal device can be a smart phone (such as an Android phone, iOS phone, etc.), a tablet computer , handheld computers and mobile Internet devices (Mobile Internet Devices, MID), PAD and other terminal equipment.
- FIG. 5 does not limit the structure of the above-mentioned electronic device.
- the electronic device may also include more or fewer components (such as network interfaces, display devices, etc.) than shown in FIG. 5 , or have a different configuration than that shown in FIG. 5 .
- the program can be stored in a computer-readable storage medium, and the storage medium can Including: flash disk, ROM, RAM, magnetic disk or optical disk, etc.
- the integrated units in the above embodiments are implemented in the form of software functional units and sold or used as independent products, they can be stored in the above computer-readable storage medium.
- the technical solution of the present disclosure is essentially or contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, It includes several instructions to cause one or more computer devices (which can be personal computers, servers or network devices, etc.) to execute all or part of the steps of the methods described in various embodiments of the present disclosure.
- the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution provided in this embodiment.
- each functional unit in various embodiments of the present disclosure may be integrated into one processing unit, each unit may physically exist independently, or at least two units may be integrated into one unit.
- the above integrated units can be implemented in the form of hardware or software functional units.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Quality & Reliability (AREA)
- Data Mining & Analysis (AREA)
- Computer Hardware Design (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Debugging And Monitoring (AREA)
Abstract
本公开公开了一种日志的处理方法和装置、存储介质及电子装置,涉及智能家居领域,该日志的处理方法包括:获取待处理的多个目标日志,其中,每个目标日志用于记录通过一个控制语句对一个功能的调用;获取每个目标日志所记录的控制语句的句向量,得到与每个目标日志对应的句向量;基于与每个目标日志对应的句向量对多个目标日志进行聚类,得到多个目标类簇,其中,每个目标类簇与一个功能分别对应;分别对每个目标类簇中的目标日志执行分析操作,得到与每个目标类簇对应的分析结果。
Description
本公开要求于2022年4月29日提交中国专利局、申请号为202210467513.8、发明名称“日志的处理方法和装置、存储介质及电子装置”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。
本公开涉及智能家居领域,具体而言,涉及一种日志的处理方法和装置、存储介质及电子装置。
目前,用户可以通过控制语句调用系统所提供的功能以执行对应的操作,上述系统可以是搭载智能问答的交互系统(例如,车载问答辅助系统、智能音响、手机语音助手等),即,智能问答系统。
为了提高系统进行功能调用的准确性,以系统为智能问答系统为例,可以对系统中的语聊数据(即,日志)进行筛查。在进行语聊数据筛查时,除了采用常规的黑白盒测试方法来正向测试系统的语聊泛化程度外,还可以由分析人员通过观测后台中的语聊数据中的控制语句来反向发现系统的错误。
然而,上述日志的处理方式,需要分析人员一条一条观测日志来分析是否存在错误,分析错误原因。但是,由于日志的数量庞大,上述方式需要耗费的人力较大,耗时较长。由上可知,相关技术中的日志的处理方法,存在由于需要对日志逐条分析导致的日志的处理效率低的问题。
发明内容
本公开实施例提供了一种日志的处理方法和装置、存储介质及电子装置,以至少解决相关技术中的日志的处理方法存在由于需要对日志逐条分析导致的日志的处理效率低的问题。
根据本公开实施例的一个方面,提供了一种日志的处理方法,包括:获取待处理的多个目标日志,其中,每个目标日志用于记录通过一个控制语句对一个功能的调用;获取所述每个目标日志所记录的控制语句的句向量,得到与所述每个目标日志对应的句向量;基于与所述每个目标日志对应的句向量对所述多个目标日志进行聚类,得到多个目标类簇,其中,每个目标类簇与一个功能分别对应;分别对所述每个目标类簇中的目标日志执行分析操作,得到与所述每个目标类簇对应的分析结果。
根据本公开实施例的另一个方面,还提供了一种日志的处理装置,包括:第一获取单元,设置为获取待处理的多个目标日志,其中,每个目标日志用于记录通过一个控制语句对一个功能的调用;第二获取单元,设置为获取所述每个目标日志所记录的控制语句的句向量,得到与所述每个目标日志对应的句向量;聚类单元,设置为基于与所述每个目标日志对应的句向量对所述多个目标日志进行聚类,得到多个目标类簇,其中,每个目标类簇与一个功能分别对应;执行单元,设置为分别对所述每个目标类簇中的目标日志执行分析操作,得到与所述每个目标类簇对应的分析结果。
根据本公开实施例的又一方面,还提供了一种计算机可读的存储介质,该计算机可读的存储介质中存储有计算机程序,其中,该计算机程序被设置为运行时执行上述日志的处理方法。
根据本公开实施例的又一方面,还提供了一种电子装置,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,上述处理器通过计算机程序执行上述的日志的处理方法。
在本公开实施例中,通过获取待处理的多个目标日志,其中,每个目标日志用于记录通过一个控制语句对一个功能的调用;获取每个目标日志所记录的控制语句的句向量,得到与每个目标日志对应的句向量;基于与每个目标日志对应的句向量对多个目标日志进行聚类,得到多个目标类簇,其中,每个目标类簇与一个功能分别对应;分别对每个目标类簇中的目标日志执行分析操作,得到与每个目标类簇对应的分析结果,由于基于与每个目标日志对应的句向量对多个目标日志进行聚类之后得到多个目标类簇,并对每个目标类簇中的目标日志执行分析操 作,进而得到与每个目标类簇对应的分析结果,可以实现减少日志的处理量的目的,达到了提升了处理日志的效率的技术效果,进而解决了相关技术中的日志的处理方法存在由于需要对日志逐条分析导致的日志的处理效率低的问题。
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。
为了更清楚地说明本公开实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,对于本领域普通技术人员而言,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是根据本公开实施例的一种可选的日志的处理方法的硬件环境的示意图;
图2是根据本公开实施例的一种可选的日志的处理方法的流程示意图;
图3是根据本公开实施例的另一种可选的日志的处理方法的流程示意图;
图4是根据本公开实施例的一种可选的日志的处理装置的结构框图;
图5是根据本公开实施例的一种可选的电子装置的结构框图。
为了使本技术领域的人员更好地理解本公开方案,下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分的实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本公开保护的范围。
需要说明的是,本公开的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本公开的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有” 以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
根据本公开实施例的一个方面,提供了一种日志的处理方法。该日志的处理方法广泛应用于智慧家庭(Smart Home)、智能家居、智能家用设备生态、智慧住宅(Intelligence House)生态等全屋智能数字化控制应用场景。可选地,在本实施例中,上述日志的处理方法可以应用于如图1所示的由终端设备102和服务器104所构成的硬件环境中。如图1所示,服务器104通过网络与终端设备102进行连接,可用于为终端或终端上安装的客户端提供服务(如应用服务等),可在服务器上或独立于服务器设置数据库,用于为服务器104提供数据存储服务,可在服务器上或独立于服务器配置云计算和/或边缘计算服务,用于为服务器104提供数据运算服务。
上述网络可以包括但不限于以下至少之一:有线网络,无线网络。上述有线网络可以包括但不限于以下至少之一:广域网,城域网,局域网,上述无线网络可以包括但不限于以下至少之一:WIFI(Wireless Fidelity,无线保真),蓝牙。终端设备102可以并不限定于为PC、手机、平板电脑、智能空调、智能烟机、智能冰箱、智能烤箱、智能炉灶、智能洗衣机、智能热水器、智能洗涤设备、智能洗碗机、智能投影设备、智能电视、智能晾衣架、智能窗帘、智能影音、智能插座、智能音响、智能音箱、智能新风设备、智能厨卫设备、智能卫浴设备、智能扫地机器人、智能擦窗机器人、智能拖地机器人、智能空气净化设备、智能蒸箱、智能微波炉、智能厨宝、智能净化器、智能饮水机、智能门锁等。
本公开实施例的日志的处理方法可以由服务器104来执行,也可以由终端102来执行,还可以是由服务器104和终端102共同执行。其中,终端102执行本公开实施例的日志的处理方法也可以是由安装在其上的客户端来执行。
以由服务器104来执行本实施例中的日志的处理方法为例,图2是根据本公开实施例的一种可选的日志的处理方法的流程示意图,如图2所示,该方法的流 程可以包括以下步骤:
步骤S202,获取待处理的多个目标日志,其中,每个目标日志用于记录通过一个控制语句对一个功能的调用。
本实施例中的日志的处理方法可以应用到对终端设备中的日志进行处理的场景,上述终端设备可以是携带有智能问答系统的设备,也可以是包含其他功能模块的智能硬件产品。上述智能问答系统可以是问答辅助系统、智能音箱、手机语音助手等,在此不做限定。上述日志可以是与语音交互相关的调用日志,也可以是与其他功能调用相关的调用日志,在此均不做限定。本实施例中的部分示例中以终端设备为携带有智能问答系统的设备、调用日志为与语音交互相关的交互日志为例进行说明。
智能问答系统每天都会产生数以万计的交互日志,为了反向提高智能问答系统的稳定性和交互准确性,可以通过观察这些后台日志并分析,根据用户交互数据,逐步提高问答系统的泛化程度。相关技术中,一般可以通过日志分析人员按不同意图领域单独一条一条来观测分析是否发生交互错误,分析错误原因,但由于交互量庞大,该方式耗费人力较大。
在本实施例中,在对调用日志进行处理之前,可以先获取到配置有一个或多个技能的目标技能系统(可以是目标交互系统,比如,语音交互系统)中待处理的多个目标日志,目标技能系统中可以包含多种控制语句对应调用日志。每个目标日志可以用于记录通过一个控制语句对一个功能的调用。上述调用的功能可以是音乐点播类技能,也可以是家电控制类技能,还可以是实现其他功能的技能。对于音乐点播类技能而言,其对应的控制语句可以是,“我想听xx”、“播放一首xx”等;对于家电控制类技能而言,其对应的控制语句可以是,“打开空调”、“开开空调”等。
上述获取待处理的多个目标日志的方式可以是:读取后台日志数据集中一段时间内的调用日志,得到一组目标日志,一组目标日志可以包含调用同一种功能的调用日志,也可以包含调用不同种功能的调用日志,调用同一种功能所使用的控制语句可以是相同的,也可以是不同的。
步骤S204,获取每个目标日志所记录的控制语句的句向量,得到与每个目标 日志对应的句向量。
在本实施例中,在获取到待处理的多个目标日志之后,可以对每个目标日志中所记录的控制语句的句向量进行提取,从而得到与每个目标日志对应的句向量。例如,可以通过基于控制语句包含的词语,生成与每个目标日志对应的句向量,或者,可以通过BERT模型(Bidirectional Encoder Representation from Transformers,一种预训练的语言表征模型)生成目标日志中所记录的控制语句对应的句向量,即通过BERT生成控制语句对应的文本的句向量,生成的句向量能准确表征控制语句的语义,同时也能准确表征控制语句的信息,不同控制语句之间的相似度可以用对应句向量的距离来表示。
步骤S206,基于与每个目标日志对应的句向量对多个目标日志进行聚类,得到多个目标类簇,其中,每个目标类簇与一个功能分别对应。
在本实施例中,为了减少后续对调用日志进行处理的复杂度,可以基于与每个目标日志对应的句向量对多个目标日志进行聚类,得到多个目标类簇,每个目标类簇与一个功能分别对应。即,每个目标类簇包含的目标日志可以是调用同一功能的调用日志,或者,大部分是调用同一功能的调用日志。
可选地,对多个目标日志进行聚类所采用的聚类算法可以有一种或多种,可以是指定聚类数目的聚类算法,也可以是不指定聚类数据的聚类算法。例如,可以通过DBSCAN(Density-Based Spatial Clustering of Applications with Noise,具有噪声的基于密度的聚类方法)基于与每个目标日志对应的句向量对多个目标日志进行聚类,DBSCAN是一种典型的密度聚类算法,其在本文聚类中效果较好,可以把相似文本聚类到同一个簇。该算法将具有足够密度的区域划分为簇,并在具有噪声的空间数据库中发现任意形状的簇,它将簇定义为密度相连的点的最大集合。
例如,音乐点播类功能中,我想听xx、播放一首xx这些高频触发语聊都会会被聚类到一个簇;而家电控制类功能,比如高频的语聊形如“打开空调”、“开开空调”等,这些同属于控制类的语聊也会被聚类到一个簇。
这里,基于所采用的聚类算法不同,对多个目标日志进行聚类得到的多个类簇可以均为目标类簇,也可以部分为目标类簇,从多个类簇中确定目标类簇可以 是根据接收到的选取指令执行的,也可以是通过预设类簇筛选条件执行的,本实施例中对此不作限定。
步骤S208,分别对每个目标类簇中的目标日志执行分析操作,得到与每个目标类簇对应的分析结果。
在从多个目标日志中确定出多个目标类簇之后,可以对每个目标类簇中的目标日志进行处理以及分析。通过此种方式,日志分析人员可以实现根据每个簇里面的错误日志进行对比分析,筛查错误原因,并对问答系统的语聊泛化工作从实际日志数据角度反向提供支持泛化力度。
在本实施例中,对每个目标类簇中的目标日志执行的分析操作可以是针对得到的每个聚类后的目标类簇中的目标日志进行的,目标日志中可以包含该控制语句的解析结果以及解析编码,解析结果可以为解析结果标识,可以用T或者F表示,T表示解析正确,F表示解析错误,解析编码可以用于表示控制语句解析错误的原因,通过分析控制语句的解析结果以及解析编码,可以得到与每个目标类簇对应的分析结果。
例如,目标类簇分别为音乐点播类以及家电控制类,在对上述两种类簇的目标日志执行分析操作时,针对得到的每个聚类簇中的语聊,可以观察其对应的解析是否正确标识,以及其对应的解析编码,如编码0000表示解析正确,编码0001表示音乐资源不可调用,编码0002表示设备状态异常。若音乐点播类簇中的日志中包含的控制语句为“我要听歌手B的歌曲1”,其对应的解析结果为F,解析编码为0001,则该日志对应的分析结果为,解析异常,异常原因为缺少歌手B的音乐资源。通过对每个类簇里面的相似语聊进行对比分析,能快速定位到每个类簇里面错误日志的原因。
通过上述步骤S202至步骤S208,获取待处理的多个目标日志,其中,每个目标日志用于记录通过一个控制语句对一个功能的调用;获取每个目标日志所记录的控制语句的句向量,得到与每个目标日志对应的句向量;基于与每个目标日志对应的句向量对多个目标日志进行聚类,得到多个目标类簇,其中,每个目标类簇与一个功能分别对应;分别对每个目标类簇中的目标日志执行分析操作,得到与每个目标类簇对应的分析结果,解决了相关技术中的日志的处理方法存在由 于需要对日志逐条分析导致的日志的处理效率低的问题,提高了处理日志的效率。
在一个示例性实施例中,获取每个目标日志所记录的控制语句的句向量,得到与每个目标日志对应的句向量,包括:
S11,将每个目标日志所记录的控制语句输入到预训练模型中,得到与每个目标日志对应的句向量,其中,预训练模型用于根据输入的控制语句中包含的词语的向量的平均值,确定与每个目标日志对应的句向量。
由于调用日志的数据庞大,将一些算法模型应用到日志分析的过程,以此来辅助测试人员筛查错误日志,成为一种可行方案。在本实施例中,为了获取每个目标日志所记录的控制语句的句向量,可以将每个目标日志所记录的控制语句输入到预训练模型中,从而得到与每个目标日志对应的句向量。这里预训练模型可以是BERT模型,也可以是其他预训练模型。
以预训练模型为BERT模型为例,为了获取每个目标日志对应的句向量,可以将每个目标日志所记录的控制语句输入到BERT模型中,BERT模型可以根据输入的控制语句,确定控制语句对应的词向量。对该条控制语句包含的所有词向量求取平均值,可以确定该条控制语句对应的句向量,重复操作,进而可以确定与每个目标日志对应的句向量。这里,BERT模型可以准确提取句子向量,其在语义层面表示了文本句子表达的含义。
通过本实施例,采用预训练模型提取控制语句的句向量,可以提升句向量生成的准确性。
在一个示例性实施例中,基于与每个目标日志对应的句向量对多个目标日志进行聚类,得到多个目标类簇,包括:
S21,基于与每个目标日志对应的句向量对多个目标日志进行密度聚类,得到多个候选类簇;
S22,将多个候选类簇中,包含的目标日志调用相同功能的所有候选类簇,确定为目标类簇。
相关技术中,针对海量的后台日志,日志分析人员往往采用一条条人工过滤, 或者针对特定高频句式,进行特定筛查,这两种方式往往覆盖不全面,很可能把一些错误日志给遗漏掉。
为了减少在日志分析工作中的处理量,在日志分析的过程中,可以基于句向量的聚类算法模型,对去重后的日志文本进行聚类,把相同功能的类似话术都聚类到同一个簇,然后在不同的簇中针对解析编码进行异常分析,相比较于人工去后台一条条过滤、或者针对高频语聊进行过滤分析的形式,将文本先聚类,在针对不同聚类簇进行解析编码的对比分析,能从不同簇中快速定位到错误日志,分析出系统存在的错误(bug),减少日志分析人员的人力以及时间成本。
可选地,可以基于与每个目标日志对应的句向量对多个目标日志进行密度聚类,从而得到多个候选类簇。再将多个候选类簇中包含的目标日志调用相同功能的所有候选类簇,确定为目标类簇。
示例性地,可以将基于BERT模型输出的句向量(即,每个目标日志对应的句向量)输入到DBSCAN密度聚类算法模型中,以此把后台日志文本分为几簇。
从实验结果来看,对于部分功能类型的日志的聚类效果较好,比如,音乐资源点播和家电控制两个功能类别。基于此,可以基于与每个目标日志对应的句向量对多个目标日志进行密度聚类,得到多个候选类簇。在得到多个候选类簇之后,可以将多个候选类簇中与预设功能对应的类簇,确定为多个目标类簇。例如,可以将音乐资源点播和家电控制两个功能类别的所有候选类簇确定为目标类簇。
可选地,在得到多个候选类簇之后,可以确定每个候选类簇中的目标日志所调用的功能,进而将调用相同功能的所有候选类簇,确定为目标类簇,这里,对于每个候选类簇,可以分别将其作为当前候选类簇执行以下步骤,从而得到一组目标技能:确定当前候选类簇中的目标日志所调用的一组功能;将所述一组功能中,对应的目标日志的数量的技能,确定为候选技能;在与候选类簇对应的目标日志的数量占当前候选类簇中的目标日志的总数量的比例达到目标比例阈值的情况下,将当前候选类簇确定为目标类簇。
通过本实施例,将密度聚类算法应用到日志的处理过程中,基于句向量的密度聚类算法能把相似句式的文本聚类到同一个簇里,比如音乐资源点播、家电控制类等语聊能很大程度聚类到一个簇,再针对这两个技能类语聊簇,通过对解析 编码校验来进行日志分析,能快速定位出错误日志的原因,帮助日志分析人员快速发现定位问题,大大降低了人力成本。
在一个示例性实施例中,获取待处理的多个目标日志,包括:
S31,对日志数据集中的调用日志执行去重操作,得到多个目标日志,其中,每个调用日志是用于记录一个控制语句对一个功能的调用。
相关技术中,一般会采用一条条单独过滤的筛查方式对日志数据集中的调用日志进行处理,此种方式往往覆盖不了所有语聊交互日志,发现泛化不足语聊的能力很有限,面对海量交互数据,仅通过人工逐条筛查,往往浪费大量人力成本的同时,对问答系统起到的语聊泛化作用也微乎其微。
在本实施例中,可以先对日志数据集中的调用日志执行去重操作,将调用日志中包含相同控制语句以及相同解析结果的重复调用日志进行剔除,从而获取到通过去重后的目标日志的集合,即,多个目标日志。可选地,对日志数据集中的调用日志执行去重操作,可以采用其他方式,在此不做限定。
通过本实施例,对日志数据集中的调用日志进行去重之后,获取到待处理的多个调用日志,可以有效减少对于日志的处理量,提升对于日志进行处理的效率。
在一个示例性实施例中,对日志数据集中的调用日志执行去重操作,得到多个目标日志,包括:
S41,分别对每个调用日志执行以下步骤,得到多个目标日志,其中,在执行以下步骤时,每个调用日志为当前调用日志:
S42,依次对当前调用日志与多个目标日志中的每个目标日志执行匹配操作,直到满足匹配停止条件,其中,匹配停止条件包括:多个目标日志中的所有目标日志均已匹配完成,确定多个目标日志中存在与当前调用日志匹配的目标日志:
确定当前调用日志所记录的控制语句与当前目标日志所记录的控制语句之间的语句相似度,其中,当前目标日志为多个目标日志中执行匹配操作的目标日志;
在语句相似度大于或者等于目标相似度阈值、并且当前调用日志所记录的控 制语句的交互结果与当前目标日志所记录的控制语句的交互结果相同的情况下,确定当前目标日志是与当前调用日志匹配的目标日志;
在多个目标日志中不存在与当前调用日志匹配的目标日志的请情况下,将当前调用日志添加到多个目标日志中。
采用文本强匹配过滤的方式可以在一定程度上降低数据集合的量级,同样可以避免由于调用日志误去除导致的调用日志分析不全面。然而,线上实际用户交互语聊千奇百怪,因此,往往只能覆盖住一些高频语聊,对音乐点播类的海量语聊覆盖不太友好,同样,需要很大人力手工去逐条核查过滤。在本实施例中,在对日志数据集中的调用日志执行去重操作时,可以将包含语句相似度达到一定阈值、且交互结果相同的调用日志作为重复日志,将其剔除,仅保留其中的一个调用日志留作后续的分析处理。
可选地,在对日志数据集中的调用日志执行去重操作时,可以分别将待处理的调用日志与已确定的多个目标调用日志中的每个目标调用日志执行匹配操作,确定是否将待处理的调用日志添加到多个目标调用日志中,直到满足匹配停止条件。这里的匹配停止条件可以包括:多个目标日志中的所有目标日志均已匹配完成,确定多个目标日志中存在与当前调用日志匹配的目标日志。
在将待处理的调用日志与目标日志进行匹配时,可以将当前待处理的调用日志作为当前调用日志,将当前匹配的目标日志作为当前目标日志执行以下操作:确定当前调用日志所记录的控制语句与当前目标日志所记录的控制语句之间的语句相似度;在语句相似度大于或者等于目标相似度阈值、并且当前调用日志所记录的交互语句的交互结果与当前目标日志所记录的交互语句的交互结果相同的情况下,确定当前目标日志是与当前调用日志匹配的目标日志。此时,当前调用日志匹配结束。
可选地,在语句相似度小于目标相似度阈值,或者语句相似度大于或者等于目标相似度阈值、但当前调用日志所记录的交互语句的交互结果与当前目标日志所记录的交互语句的交互结果不同的情况下,可以继续将下一个目标日志作为当前目标日志与当前调用日志执行匹配操作,直到满足上述匹配停止条件。匹配停止时,在多个目标日志中不存在与当前调用日志匹配的目标日志的情况下,将当 前调用日志添加到多个目标日志中。
通过本实施例,通过对包含的控制语句的语句相似度达到一定阈值、且交互结果相同的调用日志作为重复日志执行去重操作,可以有效减少对于日志的处理量,提升对于日志进行处理的效率。
在一个示例性实施例中,对日志数据集中的调用日志执行去重操作,得到多个目标日志,包括:
S51,对日志数据集中的调用日志执行强文本匹配去重操作,得到多个目标日志,其中,强文本匹配去重操作是对日志数据集中所记录的控制语句相同的调用日志执行的去重操作。
在本实施例中,可以对日志数据集中的调用日志执行强文本匹配去重操作,从而得到多个目标日志,强文本匹配去重操作是对日志数据集中所记录的控制语句相同的调用日志执行的去重操作。
可选地,可以由日志分析人员针对所有语聊调用日志,以文本强匹配方式统计出每条文本的交互次数,如“打开空调”这条语聊交互量10000次,这样日志分析人员针对该文本语聊不再需要重复筛查10000次,仅需过滤校验一次即可核查问答系统是否支持对该语聊的解析。
通过本实施例,先对后台日志进行强文本匹配去重,在进行逐条日志的筛查分析,可以在一定程度上降低数据集合的量级,提升对日志进行处理的效率。
在一个示例性实施例中,分别对每个目标类簇中的目标日志执行分析操作,得到与每个目标类簇对应的分析结果,包括:
S61,根据每个目标类簇中的目标日志所记录的交互结果,从每个目标类簇中筛选出记录的交互结果为交互异常的异常日志,得到每个目标类簇的分析结果,其中,每个目标类簇的分析结果包含每个目标类簇中的异常日志。
在本实施例中,可以根据每个目标类簇中的目标日志所记录的交互结果,从每个目标类簇中筛选出记录的交互结果为交互异常的异常日志,比如,解析结果为F的目标日志,并对交互异常的异常日志进行分析,从而得到目标类簇的分析 结果,例如,对异常日志的解析编码进行分析,确定导致异常的原因,并对异常的原因进行统计分析,得到目标类簇的分析结果。
示例性地,针对音乐点播类以及家电控制类这两个类别的聚类文本,可以通过观察解析是否正确的状态标签,快速定位到错误解析日志,通过该日志辅助分析方案,对音乐资源点播功能,可以筛查出资源库不支持的歌手、歌曲信息,便于后期泛化添加;对于家电控制类功能,可以轻松定位到每台设备出现的异常状态,便于后期排查处理。
具体地,对上述两种类簇的目标日志执行分析操作时,可以观察其对应的解析是否正确标示,以及其对应的解析编码,如编码0000表示解析正确,编码0001表示音乐资源不可调用,编码0002表示设备状态异常。若音乐点播类簇中的日志中包含的控制语句为“我要听歌手B的歌曲1”,其对应的解析结果为F,解析编码为0001,则该日志对应的分析结果为,解析异常,异常原因为缺少歌手B的音乐资源。
通过本实施例,通过对每个类簇里面的相似语聊进行对比分析,能快速定位到每个类簇里面错误日志的原因。提高分析操作执行的效率。
下面结合可选示例对本公开实施例中的日志的处理方法进行解释说明。在本可选示例中,调用日志为语音交互日志,处理为对语音交互日志中的语音交互中的交互异常进行筛查。
对于问答系统,测试人员除了采用常规的黑白盒测试方法来正向测试系统的语聊泛化程度外,还通常会指定人员观测后台一些高频交互语句来反向发现系统的错误交互,以此来反向排查系统bug,筛查语聊泛化不足的错误。但数以万计的交互日志单单通过人力去手动筛查,往往不现实,也无法完成。而先对交互日志通过文本强匹配方式去重后,再进行手动筛选方式,由于强匹配去重之后的语聊数据仍然较多,依然存在耗费太多人力成本,且效果不显著的缺陷。
本可选示例中提供了一种基于句向量聚类通过日志筛查对问答系统语聊进行泛化的方案,将基于句向量聚类算法应用到日志分析中:首先对后台海量的实际交互数据进行聚类,划分为不同簇;然后,对音乐点播类簇和家电控制类簇进行解析编码的筛查分析,可以在一定程度上快速定位解析错误原因,为问答系统 语聊泛化工作赋能。
如图3所示,本可选示例中的日志的处理方法的流程可以包括以下步骤:
步骤S302,后台日志数据集进行强文本匹配去重。
对于后台日志文本语聊数据集通过文本强匹配方式去重,得到待处理的一组目标日志。
步骤S304,BERT模型输出句子向量。
将去重后的一组目标日志输入至BERT模型中,从而获取得到每个目标日志中交互语句文本对应的句向量。
步骤S306,DBSCAN文本聚类。
将聚类算法应用到该过程中,基于句向量的聚类算法能把相似句式的文本聚类到同一个簇里,比如音乐资源点播、家电控制类等语聊能很大程度聚类到一个簇。将获取得到的句向量输入DBSCAN密度聚类算法,进行文本聚类,区分出音乐点播类语聊和家电控制类语聊,形成两个不同类簇。
步骤S308,执行分析操作。
针对得到的不同聚类簇中的语聊类簇,比如,音乐资源点播、家电控制这两个技能类语聊簇,对解析编码校验来进行日志分析:观察其对应的解析是否正确标示(T或者F),及其对应的解析编码,如编码0000表示解析正确,编码0001表示音乐资源不可调用,编码0002表示设备状态异常,通过对每个簇里面的相似语聊进行对比分析,能快速定位到每个簇里面错误日志的原因。
通过本可选示例,将基于句向量的聚类算法应用到文本日志分析,来辅助日志分析人员进行后台日志筛查,便于其快速定位错误日志,发现系统交互错误,可以快速定位出错误日志的原因,帮着日志分析人员快速发现定位问题,大大降低了人力成本,解决了传统方式耗费太多人力成本,且效果不显著的问题。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本公开并不受所描述的动作顺序的限制,因为依据本公开,某些步骤可以采用其他顺序或者同时进行。其次, 本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本公开所必须的。
根据本公开实施例的另一个方面,还提供了一种用于实施上述日志的处理方法的日志的处理装置。图4是根据本公开实施例的一种可选的日志的处理装置的结构框图,如图4所示,该装置可以包括:
第一获取单元402,设置为获取待处理的多个目标日志,其中,每个目标日志用于记录通过一个控制语句对一个功能的调用;
第二获取单元404,与第一获取单元402相连,设置为获取每个目标日志所记录的控制语句的句向量,得到与每个目标日志对应的句向量;
聚类单元406,与第二获取单元404相连,设置为基于与每个目标日志对应的句向量对多个目标日志进行聚类,得到多个目标类簇,其中,每个目标类簇与一个功能分别对应;
执行单元408,与聚类单元406相连,设置为分别对每个目标类簇中的目标日志执行分析操作,得到与每个目标类簇对应的分析结果。
需要说明的是,该实施例中的第一获取单元402可以设置为执行上述步骤S202,该实施例中的第二获取单元404可以设置为执行上述步骤S204,该实施例中的聚类单元406可以设置为执行上述步骤S206,该实施例中的执行单元408可以设置为执行上述步骤S208。
通过上述模块,获取待处理的多个目标日志,其中,每个目标日志用于记录通过一个控制语句对一个功能的调用;获取每个目标日志所记录的控制语句的句向量,得到与每个目标日志对应的句向量;基于与每个目标日志对应的句向量对多个目标日志进行聚类,得到多个目标类簇,其中,每个目标类簇与一个功能分别对应;分别对每个目标类簇中的目标日志执行分析操作,得到与每个目标类簇对应的分析结果,解决了相关技术中的日志的处理方法存在由于需要对日志逐条分析导致的日志的处理效率低的问题,提高了处理日志的效率。
在一个示例性实施例中,第二获取单元包括:
输入模块,设置为将每个目标日志所记录的控制语句输入到预训练模型中,得到与每个目标日志对应的句向量,其中,预训练模型用于根据输入的控制语句中包含的词语的向量的平均值,确定与每个目标日志对应的句向量。
在一个示例性实施例中,聚类单元包括:
聚类模块,设置为基于与每个目标日志对应的句向量对多个目标日志进行密度聚类,得到多个候选类簇;
确定模块,设置为将多个候选类簇中,包含的目标日志调用相同功能的所有候选类簇,确定为目标类簇。
在一个示例性实施例中,第一获取单元包括:
执行模块,设置为对日志数据集中的调用日志执行去重操作,得到多个目标日志,其中,每个调用日志用于记录通过一个控制语句对一个功能的调用。
在一个示例性实施例中,执行模块包括:
第一执行子模块,设置为分别对每个调用日志执行以下步骤,得到多个目标日志,其中,在执行以下步骤时,每个调用日志为当前调用日志:
依次对当前调用日志与多个目标日志中的目标日志执行匹配操作,直到满足匹配停止条件,其中,匹配停止条件包括:多个目标日志中的所有目标日志均已匹配完成,确定多个目标日志中存在与当前调用日志匹配的目标日志:
确定当前调用日志所记录的控制语句与当前目标日志所记录的控制语句之间的语句相似度,其中,当前目标日志为多个目标日志中执行匹配操作的目标日志;
在语句相似度大于或者等于目标相似度阈值、并且当前调用日志所记录的控制语句的交互结果与当前目标日志所记录的控制语句的交互结果相同的情况下,确定当前目标日志是与当前调用日志匹配的目标日志;
在多个目标日志中不存在与当前调用日志匹配的目标日志的情况下,将当前 调用日志添加到多个目标日志中。
在一个示例性实施例中,执行模块包括:
第二执行子模块,设置为对日志数据集中的调用日志执行强文本匹配去重操作,得到多个目标日志,其中,强文本匹配去重操作是对日志数据集中所记录的控制语句相同的调用日志执行的去重操作。
在一个示例性实施例中,执行单元包括:
筛选模块,设置为根据每个目标类簇中的目标日志所记录的交互结果,从每个目标类簇中筛选出记录的交互结果为交互异常的异常日志,得到每个目标类簇的分析结果,其中,每个目标类簇的分析结果包含每个目标类簇中的异常日志。
此处需要说明的是,上述模块与对应的步骤所实现的示例和应用场景相同,但不限于上述实施例所公开的内容。需要说明的是,上述模块作为装置的一部分可以运行在如图1所示的硬件环境中,可以通过软件实现,也可以通过硬件实现,其中,硬件环境包括网络环境。
根据本公开实施例的又一个方面,还提供了一种存储介质。可选地,在本实施例中,上述存储介质可以用于执行本公开实施例中上述任一项日志的处理方法的程序编码。
可选地,在本实施例中,上述存储介质可以位于上述实施例所示的网络中的多个网络设备中的至少一个网络设备上。
可选地,在本实施例中,存储介质被设置为存储用于执行以下步骤的程序编码:
S1,获取待处理的多个目标日志,其中,每个目标日志用于记录通过一个控制语句对一个功能的调用;
S2,获取每个目标日志所记录的交互语句的句向量,得到与每个目标日志对应的句向量;
S3,基于与每个目标日志对应的句向量对多个目标日志进行聚类,得到多个目标类簇,其中,每个目标类簇与一个功能分别对应;
S4,分别对每个目标类簇中的目标日志执行分析操作,得到与每个目标类簇对应的分析结果。
可选地,本实施例中的具体示例可以参考上述实施例中所描述的示例,本实施例中对此不再赘述。
可选地,在本实施例中,上述存储介质可以包括但不限于:U盘、ROM、RAM、移动硬盘、磁碟或者光盘等各种可以存储程序编码的介质。
根据本公开实施例的又一个方面,还提供了一种用于实施上述日志的处理方法的电子装置,该电子装置可以是服务器、终端、或者其组合。
图5是根据本公开实施例的一种可选的电子装置的结构框图,如图5所示,包括处理器502、通信接口504、存储器506和通信总线508,其中,处理器502、通信接口504和存储器506通过通信总线508完成相互间的通信,其中,
存储器506,设置为存储计算机程序;
处理器502,设置为执行存储器506上所存放的计算机程序时,实现如下步骤:
S1,获取待处理的多个目标日志,其中,每个目标日志用于记录通过一个控制语句对一个功能的调用;
S2,获取每个目标日志所记录的交互语句的句向量,得到与每个目标日志对应的句向量;
S3,基于与每个目标日志对应的句向量对多个目标日志进行聚类,得到多个目标类簇,其中,每个目标类簇与一个功能分别对应;
S4,分别对每个目标类簇中的目标日志执行分析操作,得到与每个目标类簇对应的分析结果。
可选地,通信总线可以是PCI(Peripheral Component Interconnect,外设部件互连标准)总线、或EISA(Extended Industry Standard Architecture,扩展工业标准结构)总线等。该通信总线可以分为地址总线、数据总线、控制总线等。为便于表示,图5中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。通信接口用于上述电子装置与其他设备之间的通信。
存储器可以包括RAM,也可以包括非易失性存储器(non-volatile memory),例如,至少一个磁盘存储器。可选地,存储器还可以是至少一个位于远离前述处理器的存储装置。
作为一种示例,上述存储器506中可以但不限于包括上述日志的处理装置中的第一获取单元402、第二获取单元404、聚类单元406、以及执行单元408。此外,还可以包括但不限于上述日志的处理装置中的其他模块单元,本示例中不再赘述。
上述处理器可以是通用处理器,可以包含但不限于:CPU(Central Processing Unit,中央处理器)、NP(Network Processor,网络处理器)等;还可以是DSP(Digital Signal Processing,数字信号处理器)、ASIC(Application Specific Integrated Circuit,专用集成电路)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。
可选地,本实施例中的具体示例可以参考上述实施例中所描述的示例,本实施例在此不再赘述。
本领域普通技术人员可以理解,图5所示的结构仅为示意,实施上述日志的处理方法的设备可以是终端设备,该终端设备可以是智能手机(如Android手机、iOS手机等)、平板电脑、掌上电脑以及移动互联网设备(Mobile Internet Devices,MID)、PAD等终端设备。图5其并不对上述电子装置的结构造成限定。例如,电子装置还可包括比图5中所示更多或者更少的组件(如网络接口、显示装置等),或者具有与图5所示的不同的配置。
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令终端设备相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:闪存盘、ROM、RAM、磁盘或光盘等。
上述本公开实施例序号仅仅为了描述,不代表实施例的优劣。
上述实施例中的集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在上述计算机可读取的存储介质中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在存储介质中,包括若干指令用以使得一台或多台计算机设备(可为个人计算机、服务器或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。
在本公开的上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例中所提供的方案的目的。
另外,在本公开各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以至少两个单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
以上所述仅是本公开的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本公开原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本公开的保护范围。
Claims (15)
- 一种日志处理方法,包括:获取待处理的多个目标日志,其中,每个目标日志用于记录通过一个控制语句对一个功能的调用;获取所述每个目标日志所记录的控制语句的句向量,得到与所述每个目标日志对应的句向量;基于与所述每个目标日志对应的句向量对所述多个目标日志进行聚类,得到多个目标类簇,其中,每个目标类簇与一个功能分别对应;分别对所述每个目标类簇中的目标日志执行分析操作,得到与所述每个目标类簇对应的分析结果。
- 根据权利要求1所述的方法,其中,所述获取所述每个目标日志所记录的控制语句的句向量,得到与所述每个目标日志对应的句向量,包括:将所述每个目标日志所记录的控制语句输入到预训练模型中,得到与所述每个目标日志对应的句向量,其中,所述预训练模型用于根据输入的控制语句中包含的词语的向量的平均值,确定与所述每个目标日志对应的句向量。
- 根据权利要求1所述的方法,其中,所述基于与所述每个目标日志对应的句向量对所述多个目标日志进行聚类,得到多个目标类簇,包括:基于与所述每个目标日志对应的句向量对所述多个目标日志进行密度聚类,得到多个候选类簇;将所述多个候选类簇中,包含的目标日志调用相同功能的所有候选类簇,确定为所述目标类簇。
- 根据权利要求1所述的方法,其中,所述获取待处理的多个目标日志,包括:对日志数据集中的调用日志执行去重操作,得到所述多个目标日志,其中,每个调用日志用于记录通过一个控制语句对一个功能的调用。
- 根据权利要求4所述的方法,其中,所述对日志数据集中的调用日志执行去重 操作,得到所述多个目标日志,包括:分别对所述每个调用日志执行以下步骤,得到所述多个目标日志,其中,在执行以下步骤时,所述每个调用日志为当前调用日志:依次对所述当前调用日志与所述多个目标日志中的目标日志执行匹配操作,直到满足匹配停止条件,其中,所述匹配停止条件包括:所述多个目标日志中的所有目标日志均已匹配完成,确定所述多个目标日志中存在与所述当前调用日志匹配的目标日志:确定所述当前调用日志所记录的控制语句与当前目标日志所记录的控制语句之间的语句相似度,其中,所述当前目标日志为所述多个目标日志中执行所述匹配操作的目标日志;在所述语句相似度大于或者等于目标相似度阈值、并且所述当前调用日志所记录的控制语句的交互结果与当前目标日志所记录的控制语句的交互结果相同的情况下,确定所述当前目标日志是与所述当前调用日志匹配的目标日志;在所述多个目标日志中不存在与所述当前调用日志匹配的目标日志的情况下,将所述当前调用日志添加到所述多个目标日志中。
- 根据权利要求4所述的方法,其中,所述对日志数据集中的调用日志执行去重操作,得到所述多个目标日志,包括:对所述日志数据集中的调用日志执行强文本匹配去重操作,得到所述多个目标日志,其中,所述强文本匹配去重操作是对所述日志数据集中所记录的控制语句相同的调用日志执行的去重操作。
- 根据权利要求1至6中任一项所述的方法,其中,所述分别对所述每个目标类簇中的目标日志执行分析操作,得到与所述每个目标类簇对应的分析结果,包括:根据所述每个目标类簇中的目标日志所记录的交互结果,从所述每个目标类簇中筛选出记录的交互结果为交互异常的异常日志,得到所述每个目标类簇的分析结果,其中,所述每个目标类簇的分析结果包含所述每个目标类簇中的异常日志。
- 一种日志处理装置,包括:第一获取单元,设置为获取待处理的多个目标日志,其中,每个目标日志用于记录通过一个控制语句对一个功能的调用;第二获取单元,设置为获取所述每个目标日志所记录的控制语句的句向量,得到与所述每个目标日志对应的句向量;聚类单元,设置为基于与所述每个目标日志对应的句向量对所述多个目标日志进行聚类,得到多个目标类簇,其中,每个目标类簇与一个功能分别对应;执行单元,设置为分别对所述每个目标类簇中的目标日志执行分析操作,得到与所述每个目标类簇对应的分析结果。
- 根据权利要求8所述的装置,其中,所述第二获取单元包括:输入模块,设置为将所述每个目标日志所记录的控制语句输入到预训练模型中,得到与所述每个目标日志对应的句向量,其中,所述预训练模型用于根据输入的控制语句中包含的词语的向量的平均值,确定与所述每个目标日志对应的句向量。
- 根据权利要求8所述的装置,其中,所述聚类单元包括:聚类模块,设置为基于与所述每个目标日志对应的句向量对所述多个目标日志进行密度聚类,得到多个候选类簇;确定模块,设置为将所述多个候选类簇中,包含的目标日志调用相同功能的所有候选类簇,确定为所述目标类簇。
- 根据权利要求8所述的装置,其中,所述第一获取单元包括:执行模块,设置为对日志数据集中的调用日志执行去重操作,得到所述多个目标日志,其中,每个调用日志用于记录通过一个控制语句对一个功能的调用。
- 根据权利要求11所述的装置,其中,所述执行模块包括:第一执行子模块,设置为分别对所述每个调用日志执行以下步骤,得到所述多个目标日志,其中,在执行以下步骤时,所述每个调用日志为当前调用日志:依次对所述当前调用日志与所述多个目标日志中的目标日志执行匹配操作,直到满足匹配停止条件,其中,所述匹配停止条件包括:所述多个目标日志中的所有目标日志均已匹配完成,确定所述多个目标日志中存在与所述当前调用日志匹配的目标日志:确定所述当前调用日志所记录的控制语句与当前目标日志所记录的控制语句之间的语句相似度,其中,所述当前目标日志为所述多个目标日志中执行所述匹配操作的目标日志;在所述语句相似度大于或者等于目标相似度阈值、并且所述当前调用日志所记录的控制语句的交互结果与当前目标日志所记录的控制语句的交互结果相同的情况下,确定所述当前目标日志是与所述当前调用日志匹配的目标日志;在所述多个目标日志中不存在与所述当前调用日志匹配的目标日志的情况下,将所述当前调用日志添加到所述多个目标日志中。
- 根据权利要求8至12中任一项所述的装置,其中,所述执行单元包括:筛选模块,设置为根据所述每个目标类簇中的目标日志所记录的交互结果,从所述每个目标类簇中筛选出记录的交互结果为交互异常的异常日志,得到所述每个目标类簇的分析结果,其中,所述每个目标类簇的分析结果包含所述每个目标类簇中的异常日志。
- 一种计算机可读的存储介质,所述计算机可读的存储介质包括存储的程序,其中,所述程序运行时执行权利要求1至7中任一项所述的方法。
- 一种电子装置,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器被设置为通过所述计算机程序执行权利要求1至7中任一项所述的方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210467513.8 | 2022-04-29 | ||
CN202210467513.8A CN117009193A (zh) | 2022-04-29 | 2022-04-29 | 日志的处理方法和装置、存储介质及电子装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023206702A1 true WO2023206702A1 (zh) | 2023-11-02 |
Family
ID=88517110
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/096434 WO2023206702A1 (zh) | 2022-04-29 | 2022-05-31 | 日志的处理方法和装置、存储介质及电子装置 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN117009193A (zh) |
WO (1) | WO2023206702A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117978543A (zh) * | 2024-03-28 | 2024-05-03 | 贵州华谊联盛科技有限公司 | 基于态势感知的网络安全预警方法及系统 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112199480A (zh) * | 2020-09-18 | 2021-01-08 | 厦门快商通科技股份有限公司 | 一种基于bert模型的在线对话日志违规检测方法及系统 |
US20210026890A1 (en) * | 2018-02-09 | 2021-01-28 | Nippon Telegraph And Telephone Corporation | Faq consolidation assistance device, faq consolidation assistance method, and program |
CN112632000A (zh) * | 2020-12-30 | 2021-04-09 | 北京天融信网络安全技术有限公司 | 日志文件聚类方法、装置、电子设备和可读存储介质 |
CN112800219A (zh) * | 2021-01-19 | 2021-05-14 | 苏宁金融科技(南京)有限公司 | 客服日志反馈回流数据库的方法及系统 |
CN113836300A (zh) * | 2021-09-24 | 2021-12-24 | 中国电信股份有限公司 | 日志分析方法、系统、设备及存储介质 |
CN114328903A (zh) * | 2021-04-25 | 2022-04-12 | 苏宁金融科技(南京)有限公司 | 基于文本聚类的客服日志回流方法及装置 |
-
2022
- 2022-04-29 CN CN202210467513.8A patent/CN117009193A/zh active Pending
- 2022-05-31 WO PCT/CN2022/096434 patent/WO2023206702A1/zh unknown
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210026890A1 (en) * | 2018-02-09 | 2021-01-28 | Nippon Telegraph And Telephone Corporation | Faq consolidation assistance device, faq consolidation assistance method, and program |
CN112199480A (zh) * | 2020-09-18 | 2021-01-08 | 厦门快商通科技股份有限公司 | 一种基于bert模型的在线对话日志违规检测方法及系统 |
CN112632000A (zh) * | 2020-12-30 | 2021-04-09 | 北京天融信网络安全技术有限公司 | 日志文件聚类方法、装置、电子设备和可读存储介质 |
CN112800219A (zh) * | 2021-01-19 | 2021-05-14 | 苏宁金融科技(南京)有限公司 | 客服日志反馈回流数据库的方法及系统 |
CN114328903A (zh) * | 2021-04-25 | 2022-04-12 | 苏宁金融科技(南京)有限公司 | 基于文本聚类的客服日志回流方法及装置 |
CN113836300A (zh) * | 2021-09-24 | 2021-12-24 | 中国电信股份有限公司 | 日志分析方法、系统、设备及存储介质 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117978543A (zh) * | 2024-03-28 | 2024-05-03 | 贵州华谊联盛科技有限公司 | 基于态势感知的网络安全预警方法及系统 |
CN117978543B (zh) * | 2024-03-28 | 2024-06-04 | 贵州华谊联盛科技有限公司 | 基于态势感知的网络安全预警方法及系统 |
Also Published As
Publication number | Publication date |
---|---|
CN117009193A (zh) | 2023-11-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10671471B2 (en) | Topology-based feature selection for anomaly detection | |
US9734005B2 (en) | Log analytics for problem diagnosis | |
US11595415B2 (en) | Root cause analysis in multivariate unsupervised anomaly detection | |
KR102444615B1 (ko) | 온라인 서비스의 거동 변화의 인식 기법 | |
CN107436844B (zh) | 一种接口用例合集的生成方法及装置 | |
CN112148772A (zh) | 告警根因识别方法、装置、设备和存储介质 | |
CN107861981B (zh) | 一种数据处理方法及装置 | |
US10832164B2 (en) | Generating streaming analytics applications using a glossary | |
TW201737072A (zh) | 一種對應用程序進行項目評估的方法及系統 | |
CN109815119B (zh) | 一种app链接渠道的测试方法及装置 | |
WO2023206702A1 (zh) | 日志的处理方法和装置、存储介质及电子装置 | |
US10713070B2 (en) | Systems and methods for capturing and visualizing user interactions across devices | |
WO2021213135A1 (zh) | 音频处理方法、装置、电子设备和存储介质 | |
CN109754014B (zh) | 工业模型训练方法、装置、设备及介质 | |
CN110909005B (zh) | 一种模型特征分析方法、装置、设备及介质 | |
CN115454702A (zh) | 日志故障分析方法、装置、存储介质及电子设备 | |
US20140229923A1 (en) | Commit sensitive tests | |
CN117632710A (zh) | 测试代码的生成方法、装置、设备及存储介质 | |
CN112416333A (zh) | 软件模型训练方法、装置、系统、设备和存储介质 | |
CN114171107B (zh) | 固态硬盘vpd信息的检测方法、装置、设备及存储介质 | |
WO2023086158A1 (en) | System and method for identifying performance bottlenecks | |
CN110806961A (zh) | 一种智能预警方法及系统、推荐系统 | |
WO2023206701A1 (zh) | 指令执行设备的选取方法和装置、存储介质及电子装置 | |
CN115171657B (zh) | 语音设备的测试方法和装置、存储介质 | |
WO2023206703A1 (zh) | 事件槽位的提取方法和装置、存储介质及电子装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22939535 Country of ref document: EP Kind code of ref document: A1 |