CN116628704A

CN116628704A - Data processing method, device, electronic equipment, storage medium and program product

Info

Publication number: CN116628704A
Application number: CN202310672887.8A
Authority: CN
Inventors: 泮求亮; 李宫
Original assignee: Netease Hangzhou Network Co Ltd
Current assignee: Netease Hangzhou Network Co Ltd
Priority date: 2023-06-07
Filing date: 2023-06-07
Publication date: 2023-08-22

Abstract

The application provides a data processing method, a data processing device, an electronic device, a storage medium and a program product. The method comprises the following steps: determining a mapping relation between the log submitter and first description information of the log submitter according to the historical log data; encoding the first description information according to a preset encoding algorithm to determine text characteristics; splicing according to the first personal characteristics and the text characteristics to obtain a sample data set, and training a pre-constructed decision tree model based on the sample data set to obtain a data prediction model; determining candidate log submitters corresponding to the second description information in the current vulnerability data according to the mapping relation; encoding the second description information according to a preset encoding algorithm to determine target text features, and splicing the second personal features corresponding to the candidate log submitters and the target text features to determine a target data set; and inputting the target data set into a data prediction model to determine a target log presenter with highest correlation with the second descriptive information.

Description

Data processing method, device, electronic equipment, storage medium and program product

Technical Field

The present application relates to the field of computer technologies, and in particular, to a data processing method, apparatus, electronic device, storage medium, and program product.

Background

In the related art, when repairing a game bug, a responsible person causing the game bug needs to be positioned, so that corresponding repair work is performed on the game bug. The name function (also referred to as a chase function) of the version management tool of the game typically relies on the error log to locate the developer of the code corresponding to the game vulnerability.

However, there may be other operators who do not develop the code, besides the developer of the code, so that a game bug appears, and the error log may not record the game bug, which may cause a problem of inaccurate responsibility of the game bug.

Disclosure of Invention

In view of the above, an object of the present application is to provide a data processing method, apparatus, electronic device, storage medium, and program product.

In view of the above object, in a first aspect, the present application provides a data processing method, the method comprising:

determining a mapping relation between a log presenter and first description information presented by the log presenter according to pre-acquired historical log data;

Encoding the first description information according to a preset encoding algorithm to determine text characteristics;

splicing according to the first personal characteristic corresponding to the log presenter and the text characteristic to obtain a sample data set, and training a pre-constructed decision tree model based on the sample data set to obtain a data prediction model;

determining candidate log submitters corresponding to second description information in the pre-acquired current vulnerability data according to the mapping relation;

encoding the second descriptive information according to the preset encoding algorithm to determine a target text feature, and splicing a second personal feature corresponding to the candidate log presenter and the target text feature to determine a target data set;

and inputting the target data set into the data prediction model to determine a target log presenter with the highest correlation degree with the second descriptive information.

In a second aspect, the present application provides a data processing apparatus, the apparatus comprising:

a first determining module configured to determine a mapping relationship between a log presenter and first description information presented by the log presenter according to previously acquired history log data;

The encoding module is configured to encode the first description information according to a preset encoding algorithm to determine text characteristics;

the training module is configured to splice according to the first personal characteristic corresponding to the log presenter and the text characteristic to obtain a sample data set, and train a pre-constructed decision tree model based on the sample data set to obtain a data prediction model;

the second determining module is configured to determine candidate log submitters corresponding to second description information in the pre-acquired current vulnerability data according to the mapping relation;

a third determining module configured to encode the second description information according to the preset encoding algorithm to determine a target text feature, and splice a second personal feature corresponding to the candidate log presenter and the target text feature to determine a target data set;

and a fourth determining module configured to input the target data set into the data prediction model to determine a target log presenter having the highest correlation with the second description information.

In a third aspect, the present application provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the data processing method according to the first aspect when executing the program.

In a fourth aspect, the present application provides a computer-readable storage medium storing computer instructions for causing a computer to perform the data processing method of the first aspect.

In a fifth aspect, the present application provides a computer program product comprising computer program instructions which, when run on a computer, cause the computer to perform the data processing method according to the first aspect.

From the foregoing, it can be seen that the present application provides a data processing method, apparatus, electronic device, storage medium, and program product, which determine, according to previously acquired history log data, a mapping relationship between a log presenter and first description information submitted by the log presenter; encoding the first description information according to a preset encoding algorithm to determine text characteristics; splicing according to the first personal characteristic corresponding to the log presenter and the text characteristic to obtain a sample data set, and training a pre-constructed decision tree model based on the sample data set to obtain a data prediction model; determining candidate log submitters corresponding to second description information in the pre-acquired current vulnerability data according to the mapping relation; encoding the second descriptive information according to the preset encoding algorithm to determine a target text feature, and splicing a second personal feature corresponding to the candidate log presenter and the target text feature to determine a target data set; and inputting the target data set into the data prediction model to determine a target log presenter with the highest correlation degree with the second descriptive information. The mapping relation between each log presenter and the description information submitted by the log presenter is determined, and the text characteristics determined according to the description information and the personal characteristics of each log presenter are utilized to train to obtain a data prediction model, so that each presenter submitted with the log is guaranteed to be subjected to responsibility tracking, and the accuracy of the responsibility tracking of the vulnerability data is improved to a certain extent. Further, the data prediction model is utilized to predict the occurring vulnerability data so as to determine the log submitter with the highest relativity with the vulnerability data as the target log submitter to be subjected to the responsibility, and the recovery speed of the vulnerability data can be further improved and the development efficiency of the data is improved due to the improvement of the responsibility-following accuracy of the vulnerability data.

Drawings

In order to more clearly illustrate the technical solutions of the present application or related art, the drawings that are required to be used in the description of the embodiments or related art will be briefly described below, and it is apparent that the drawings in the following description are only embodiments of the present application, and other drawings may be obtained according to the drawings without inventive effort to those of ordinary skill in the art.

Fig. 1 is a schematic flow chart of an exemplary data processing method according to an embodiment of the present application.

Fig. 2 shows an exemplary flow diagram of determining a mapping relationship in an embodiment in accordance with the application.

FIG. 3 illustrates an exemplary schematic diagram of a data prediction model training process in an embodiment in accordance with the application.

FIG. 4 illustrates an exemplary diagram of a determination of candidate journal submitters in an embodiment in accordance with the present application.

FIG. 5 shows an exemplary schematic diagram of a determination process of a target journal presenter in an embodiment according to the present application.

Fig. 6 is a schematic diagram showing an exemplary structure of a data processing apparatus according to an embodiment of the present application.

Fig. 7 is a schematic diagram of an exemplary structure of an electronic device according to an embodiment of the present application.

Detailed Description

The present application will be further described in detail below with reference to specific embodiments and with reference to the accompanying drawings, in order to make the objects, technical solutions and advantages of the present application more apparent.

It should be noted that unless otherwise defined, technical or scientific terms used in the embodiments of the present application should be given the ordinary meaning as understood by one of ordinary skill in the art to which the present application belongs. The terms "first," "second," and the like, as used in embodiments of the present application, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", etc. are used merely to indicate relative positional relationships, which may also be changed when the absolute position of the object to be described is changed.

As described in the background section, in the related art, when repairing a game hole, it is necessary to locate a responsible person who causes the game hole, and then perform a corresponding repair operation on the game hole. The name function (also referred to as a chase function) of the version management tool of the game typically relies on the error log to locate the developer of the code corresponding to the game vulnerability.

The applicant found during the course of the research that, taking a game development scenario as an example, a game vulnerability (software bug) is difficult to avoid during the game development. Deviations from demand understanding, unreasonable development processes, or inattention of developers, may introduce vulnerabilities (bugs) within the project. These vulnerability-containing game products may produce unpredictable behavior or results after deployment, resulting in massive economic loss and even direct termination of the game product. Therefore, the speed of bug repair has a very important effect on game products. The primary task of repairing a bug is how to accurately distribute the bug to responsible persons. The development cycle of game products involves the collaborative cooperation of different functions such as planning, programming, art, etc., each of which in turn includes a large number of developers, and the flow of personnel during the development cycle, which causes difficulty in accurately locating the bug to the developer.

In the related art, the log with the error reported on line is collected first, then the error reporting file and code line recorded in the log are found out, and then the last submitter is found out by the name function in the version management tool and is taken as the responsible person.

The applicant finds out responsible persons through the name function of the version management tool 1. Only code developers can be found, and for games, problems of planning tables and art resources also cause bug, so that the overall coverage cannot be achieved. Moreover, because the name function extremely depends on the error report log, the responsible person cannot be found according to the function bug description, and the bug in the development period is always the bug without the error report log, so the responsible person cannot be found. Moreover, only the last submitter can be found through the name function, but the last submitter is not necessarily the responsible person of the function, and thus the false alarm is caused.

Therefore, the related art has a problem that the responsibility of the game hole is inaccurate.

As such, the present application provides a data processing method, apparatus, electronic device, storage medium, and program product, which determine, according to history log data obtained in advance, a mapping relationship between a log presenter and first description information presented by the log presenter; encoding the first description information according to a preset encoding algorithm to determine text characteristics; splicing according to the first personal characteristic corresponding to the log presenter and the text characteristic to obtain a sample data set, and training a pre-constructed decision tree model based on the sample data set to obtain a data prediction model; determining candidate log submitters corresponding to second description information in the pre-acquired current vulnerability data according to the mapping relation; encoding the second descriptive information according to the preset encoding algorithm to determine a target text feature, and splicing a second personal feature corresponding to the candidate log presenter and the target text feature to determine a target data set; and inputting the target data set into the data prediction model to determine a target log presenter with the highest correlation degree with the second descriptive information. The mapping relation between each log presenter and the description information submitted by the log presenter is determined, and the text characteristics determined according to the description information and the personal characteristics of each log presenter are utilized to train to obtain a data prediction model, so that each presenter submitted with the log is guaranteed to be subjected to responsibility tracking, and the accuracy of the responsibility tracking of the vulnerability data is improved to a certain extent. Further, the data prediction model is utilized to predict the occurring vulnerability data so as to determine the log submitter with the highest relativity with the vulnerability data as the target log submitter to be subjected to the responsibility, and the recovery speed of the vulnerability data can be further improved and the development efficiency of the data is improved due to the improvement of the responsibility-following accuracy of the vulnerability data.

In some specific application scenarios, the data processing method of the present application may be applied to various data processing related systems, which may be operated by a PC or a mobile terminal such as a mobile phone or a tablet computer.

In some specific application scenarios, the data processing method of the present application may be directly applied to local operation or may be operated in a cloud server. When the cloud server runs, the acquired data to be processed is sent to the cloud server through a network, the server processes the data to be processed through the data processing method, and the processing result is sent to the local through the network.

The data processing method according to an exemplary embodiment of the present application is described below in connection with a specific application scenario. It should be noted that the above application scenario is only shown for the convenience of understanding the spirit and principle of the present application, and the embodiments of the present application are not limited in any way. Rather, embodiments of the application may be applied to any scenario where applicable.

The data processing method provided by the embodiment of the application is specifically described by a specific embodiment.

Referring to fig. 1, the data processing method provided by the embodiment of the application specifically includes the following steps:

s102: and determining a mapping relation between the log submitter and the first description information submitted by the log submitter according to the pre-acquired historical log data.

S104: and encoding the first descriptive information according to a preset encoding algorithm to determine text characteristics.

S106: and splicing according to the first personal characteristic corresponding to the log presenter and the text characteristic to obtain a sample data set, and training a pre-constructed decision tree model based on the sample data set to obtain a data prediction model.

S108: and determining candidate log submitters corresponding to the second description information in the pre-acquired current vulnerability data according to the mapping relation.

S110: and encoding the second descriptive information according to the preset encoding algorithm to determine a target text feature, and splicing the target text feature with a second personal feature corresponding to the candidate log presenter to determine a target data set.

S112: and inputting the target data set into the data prediction model to determine a target log presenter with the highest correlation degree with the second descriptive information.

In some embodiments, version control software is indispensable in the project development process, SVN (version control system essentially of open source code) and Git (distributed version control system essentially of one open source) are the most popular version control software today, and a developer of code will upload a commit log at each iteration. Thus, when the history log data is acquired, the history log data within a preset time may be acquired, for example, a last half year of commit log on the SVN is collected, and 13000 pieces of commit records are taken in total.

Further, the obtained history log data may be processed, and the log submitter (i.e., submitter) corresponding to each submission record (or each history log) and the log submission description (i.e., the first description information) made by the log submitter of each submission record (or each history log) when submitting the log may be viewed according to the history log data.

Still further, the first description information may be subjected to word segmentation processing by jieba word segmentation, so as to determine the first keyword and the first functional text description information. Referring to fig. 2, for example, [ jack ], [ role making ], and mobile function development, where [ jack ], [ role making ] is a first keyword, and "mobile function development" is first function text description information for characterizing a function of this history log corresponding to the first description information, it can be determined that the history log is used to develop a mobile function.

Further, a dictionary corresponding to the first keyword, for example, the log presenter is developer a, the first keyword is "jack" and "role making", and then the corresponding relation between the developer a- "jack" and the developer a- "role making" is recorded in the dictionary, and according to the dictionary, the mapping relation between the log presenter and the first description information can be determined.

In order to make the mapping relation more accurate, a pre-constructed supplementary mapping relation can be obtained, specifically, the supplementary mapping relation can be a pre-made module responsible table, which comprises a first preset mapping relation between a preset log submitter and first description information and a second preset mapping relation between the log submitter and the preset description information. And then according to the dictionary and the supplementary mapping relation, the contents of the dictionary and the supplementary mapping relation can be complemented, so that a mapping relation between a complete log presenter and the first description information is formed.

In some embodiments, after the word description of the submitted log is segmented, a first keyword of the submitted log and submitted first function text description information, such as "jack", "role making", and mobile function development, are obtained, and two first keywords, namely "mobile function development", first function text description information of the jack and the role making, are extracted. Further, the first functional text description information is segmented and combined with the first keyword, and then encoded according to a preset encoding algorithm, so that text characteristics are determined. Specifically, the first description information may be encoded by jieba segmentation and TF-IDF algorithm, where TF-IDF is a statistical method used to evaluate the importance of a word to one of a set of documents or a corpus. The importance of a word increases proportionally with the number of times it appears in the file, but at the same time decreases inversely with the frequency with which it appears in the corpus.

For the method of the present application, the first keywords and the first function text description information may be subjected to jieba segmentation to determine a plurality of first segmentation words, for example, "jack", "character making", "moving", "function", "developing". Further, a plurality of first target history logs including each first word are respectively determined, such as a history log 1 including the word "jack", a history log 2 including the two words "character making" and "moving", and a history log 3 including the two words "function" and "development". For each first word, the number of occurrences of the first word in the first target history log, that is, the first number of occurrences, for example, the history log data includes a plurality of history logs, where the history log 1 includes the word "jack", and the number of occurrences of the word "jack" in the history log 1 may be determined as the first number of occurrences of the word "jack". Similarly, the number of occurrences of each first word in the corresponding history log may be counted. Next, a first word segmentation total number of all the segmented words in the first target history log may be determined, so as to determine a first word frequency of the first segmented word relative to the history log data according to the first occurrence number and the first word total number, and specifically, the algorithm is as follows:

TF (t 1) = (first number of occurrences)/(first total number of words)

Wherein t1 represents a first segmentation. For example, when there are five first words, "jack", "character creation", "movement", "function", "development", then the first word frequency of each first word is calculated separately, respectively as TF ₁ (t1)、TF ₂ (t1)、TF ₃ (t1)、TF ₄ (t 1) and TF ₅ (t1)。

Further, the total number of history logs may be determined, and the first number of first target history logs, e.g. both the simultaneous existence history log 1 and the history log 2 contain "move", "function", "develop", then the first number is 2. Further, the first reverse file frequency of the first word segment relative to the history log data may be determined according to the total number and the first number of the history logs, and the specific algorithm is as follows:

IDF(t1)＝log _e (total number of history logs/first number)

It should be noted that, still corresponding to each first word, the first reverse file frequency of each first word is determined respectively, and is respectively an IDF ₁ (t1)、IDF ₂ (t1)、IDF ₃ (t1)、IDF ₄ (t 1) and IDF ₅ (t1)。

Still further, the first segmentation word may be encoded based on a TF-IDF algorithm according to a first word frequency and a first reverse file frequency to determine a first text feature, where the specific algorithm is as follows:

TF-IDF(t1)＝TF(t1)*IDF(t1)

it should be noted that, corresponding to each first word, the first text feature of each first word may be determined separately, which is TF-IDF separately ₁ (t1)、TF-IDF ₂ (t1)、TF-IDF ₃ (t1)、TF-IDF ₄ (t 1) and TF-IDF ₅ (t 1). Text features may be determined from the plurality of first text features.

The TF-IDF weight of each word can be calculated through the steps. Assuming that the word stock formed in all the submission histories contains 10000 words in total, wherein "jack", "role making", "moving", "function", "developing" these five first words respectively correspond to numbers of 1,2,3,4,5, the text description information "jack", "role making", and moving function developing "can be converted into a vector of 1 x 10000, wherein the values of columns 1,2,3,4,5 are TF-IDF weights of the respective words, and the other columns are 0.

It should be noted that the text feature generated according to the TF-IDF algorithm is too sparse, the dimension of the text feature is equal to the size of the word stock, and the dimension of the text feature can be reduced to 300 dimensions by adopting the PCA algorithm, so as to obtain the text feature after the dimension reduction.

In some embodiments, text features may be stitched with a first personal feature corresponding to each log presenter, resulting in a sample dataset that can be used to train a pre-built decision tree model. Referring to fig. 2, the first personal characteristics may include the bug rate of each log submitter in previous work, the total number of log submissions, the time the log was last submitted, the responsible module, and the role, for example, for developer a, the first personal characteristics may include "bug rate: 10% "and" last commit time: before day ", for developer B, the first personal characteristics may include" bug ratio: 10% "and" last commit time: before one day).

Further, referring to fig. 3, for each text feature, a splice may be made with each first personal feature, respectively, to obtain a plurality of first sample data, and a first sample data set is determined according to the plurality of first sample data. For example, for text feature 1, if there is only developer A, and the first personal feature of developer A, "bug Rate: 10% "and" last commit time: before one day ", the text feature 1 is respectively associated with the bug ratio: 10% "and" last commit time: splicing before one day to obtain text characteristics 1- "bug ratio: 10% "and text feature 1-last commit time: the two first sample data are "before a day" such that a first sample data set is determined from the two first sample data. And further obtaining a first sample data set corresponding to each text feature, determining the sample data set according to all the first sample data sets, and taking the sample data set as input of a pre-constructed decision tree model. Wherein, the pre-built decision tree model can select the LightGBM model.

Still further, the sample dataset may be partitioned into a training set for training the pre-constructed decision tree model and a test set for testing the data prediction model according to a pre-set ratio, e.g., the training set and the test set are partitioned by a 4:1 ratio. Inputting the training set into a pre-constructed decision tree model to determine a training result for representing the correlation degree between the first description information and the log submitter, and when the training result reaches the preset training result or reaches the preset training round, adjusting the pre-constructed decision tree model according to the testing set, inputting the testing set, and adjusting model parameters of the pre-constructed decision tree model according to the obtained output result so as to obtain the data prediction model.

Still further, to ensure that the data prediction model can more accurately predict the correlation between the first description information and the log presenter, a tag may be set for each piece of data in the sample data set. Specifically, it may be determined whether a first log presenter corresponding to a first personal feature in the training set and a second log presenter corresponding to a text feature spliced with the first personal feature are identical, and if the first log presenter and the second log presenter are identical, a first tag is set for the spliced first personal feature and text feature. For example, the first person is characterized by "bug rate: 10% ", and its corresponding first journal submitter is developer a, and is associated with a first personal characteristic" bug ratio: 10% "the text feature 1 that splices, before encoding, is the first function text description information" mobile function development ", and the second log submitter that" mobile function development "corresponds is developer a too, prove this sample data after splice is positive sample, can set up first label, for example 1 for it.

It should be noted that if the first log presenter and the second log presenter are different, a second tag is set for the first personal feature and the text feature that are spliced. For example, the first person is characterized by "bug rate: 10% ", and its corresponding first journal submitter is developer a, and is associated with a first personal characteristic" bug ratio: 10% "the text feature 2 that carries out the concatenation, before carrying out the coding for first function text description information" skill function development ", and" skill function development "corresponding second log submitter is developer B, prove this strip sample data after the concatenation is negative sample, can set up the second label for it, for example 0. And performing regression training on the LightGBM model according to the sample data set and the label, and training the correlation between the first functional text description information and the log submitter.

In some embodiments, referring to FIG. 4, function-based filtering is employed to recall a series of developers with related functional submissions, i.e., after current vulnerability data is obtained, a candidate log submitter that should be responsible for the vulnerability data is determined. Specifically, the second description information in the current vulnerability data obtained in advance may be segmented, so as to determine a second keyword, for example, the obtained second keyword is "jack" and "role making", further, according to the mapping relationship determined in the foregoing embodiment, a dictionary corresponding to the first keyword may be determined, whether the first keyword identical to the second keyword exists in the dictionary is determined, if the first keyword identical to the second keyword exists in the dictionary, the log submitter corresponding to the first keyword may be used as a candidate log submitter corresponding to the second description information, and then all the log submitters related to the current vulnerability data may be found, so as to obtain a candidate log submitter set (i.e. a candidate set in fig. 4).

In some embodiments, a final target log presenter responsible for the current vulnerability data may be determined from the candidate log presenters using the trained data prediction model, and thus model input values that can be used for input to the data prediction model may be determined from the second descriptive information. Specifically, the second keywords and the second functional text description information may be subjected to jieba segmentation to determine a plurality of first segmentation words, such as "move", "function", "develop". Further a plurality of second target history logs comprising each second word segment are determined, e.g. a history log 5 comprising the word segment "move" and a history log 6 comprising the two word segments "function" and "develop", respectively. For each second word, the number of occurrences of the second word in the second target history log, that is, the second number of occurrences, may be determined, for example, the history log data includes a plurality of history logs, wherein the history log 5 includes the word "move", and the number of occurrences of the word "move" in the history log 5 may be determined as the second number of occurrences of the word "move". Similarly, the number of occurrences of each second term in the corresponding history log may be counted. Then, a second word total number of all the divided words in the second target history log may be determined, so as to determine a second word frequency of the second divided words relative to the history log data according to the second occurrence number and the second word total number, and specifically, the algorithm is as follows:

TF (t 2) = (second number of occurrences)/(second total number of words)

Wherein t2 represents a second segmentation. For example, when there are three second words, "move", "function", "develop", then the second word frequency of each second word is calculated separately, respectively as TF ₁ (t2)、TF ₂ (t 2) and TF ₃ (t2)。

Further, the total number of history logs may be determined, and a second number of second target history logs, e.g. both the simultaneous existence history log 1 and the history log 2 contain "move", "function", "develop", then the second number is 2. Further, the second reverse file frequency of the second word relative to the history log data may be determined according to the total number and the second number of the history logs, and the specific algorithm is as follows:

IDF(t2)＝log _e (total number of history logs/second number)

It should be noted that, the second reverse file frequency of each second word is still determined corresponding to each second word, and is respectively IDF ₁ (t2)、IDF ₂ (t 2) and IDF ₃ (t2)。

Still further, the second word may be encoded based on a TF-IDF algorithm based on the second word frequency and the second reverse file frequency to determine the second text feature, the specific algorithm being as follows:

TF-IDF(t2)＝TF(t2)*IDF(t2)

it should be noted that, the second text feature of each second word may be determined for each second word, respectively Is TF-IDF ₁ (t2)、TF-IDF ₂ (t 2) and TF-IDF ₃ (t 2). A target text feature may be determined from the plurality of second text features.

Still further, the obtained target text feature may be spliced with a second personal feature corresponding to each candidate log submitter, so as to determine a target data set, where the second personal feature may also include a bug rate, a total number of log submissions, a time of last submitting the log, a responsible module, and a function of each log submissions in the previous work, for example, for the log submissions of the developer a, the first personal feature may include a bug rate: 10% "and" last commit time: before day ", for developer B, the first personal characteristics may include" bug ratio: 10% "and" last commit time: before one day).

And inputting the target data set into a data prediction model so as to determine the correlation degree between each candidate log presenter and the target text characteristic, and determining the target correlation degree between each candidate log presenter and the second descriptive information of the current vulnerability data according to the correlation degree. And then, according to the sequence of the target relevance from high to low, selecting the candidate log submitter with the highest target relevance as the target log submitter responsible for the current vulnerability data. Or the target log submitters with the target number can be determined according to the target relevance from high to low, for example, the first three candidate log submitters with the highest target relevance are used as target log submitters responsible for the current vulnerability data, and the current vulnerability data is repaired by combining all the target log submitters.

It should be noted that, the method of the embodiment of the present application may be performed by a single device, for example, a computer or a server. The method of the embodiment can also be applied to a distributed scene, and is completed by mutually matching a plurality of devices. In the case of such a distributed scenario, one of the devices may perform only one or more steps of the method of an embodiment of the present application, the devices interacting with each other to accomplish the method.

It should be noted that the foregoing describes some embodiments of the present application. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments described above and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

Based on the same inventive concept, the application also provides a data processing device corresponding to the method of any embodiment.

Referring to fig. 6, the data processing apparatus includes: the device comprises a first determining module, a coding module, a training module, a second determining module, a third determining module and a fourth determining module; wherein, the liquid crystal display device comprises a liquid crystal display device,

In one possible implementation manner, the first description information includes: the first keywords and the first function text description information;

the first determination module is further configured to:

acquiring historical log data in preset time, and determining the log submitter and first description information submitted by the log submitter according to the historical log data;

segmenting the first description information to determine the first keyword and the first function text description information;

and constructing a dictionary corresponding to the first keyword between the log submitter and the first keyword, and determining a mapping relation between the log submitter and the first description information according to the dictionary.

In one possible implementation, the first determining module is further configured to:

constructing a dictionary corresponding to the first keyword between the log presenter;

acquiring a pre-constructed supplementary mapping relation; wherein the supplementary mapping relationship includes: a first preset mapping relation between a preset log submitter and the first description information and a second preset mapping relation between the log submitter and the preset description information;

And determining a mapping relation between the log submitter and the first description information according to the dictionary and the supplementary mapping relation.

In one possible implementation, the history log data includes: a plurality of history logs;

the encoding module is further configured to:

the first keywords and the first function text description information are segmented to determine a plurality of first segmented words, and a plurality of first target history logs containing each first segmented word are respectively determined;

for each of the first partial words,

determining a first occurrence number of the first word segment in the corresponding first target history log and a first word segment total number of all word segments in the first target history log,

determining a first word frequency of the first word segment relative to the history log data according to the first occurrence number and the first word segment total number,

determining a total number of the history logs and a first number of the first target history logs,

determining a first reverse file frequency of the first word segment relative to the history log data based on the total number of history logs and the first number,

encoding the first word segment based on the preset encoding algorithm according to the first word frequency and the first reverse file frequency to determine a first text feature;

The text feature is determined from a plurality of the first text features.

In one possible implementation, the training module is further configured to:

for each of the features of the text,

respectively splicing the first personal characteristics to obtain a plurality of first sample data, and determining a first sample data set according to the plurality of first sample data;

the sample data set is determined from a plurality of the first sample data sets.

In one possible implementation, the training module is further configured to:

dividing the sample data set into a training set for training the pre-constructed decision tree model and a testing set for testing the data prediction model according to a preset proportion;

inputting the training set into the pre-built decision tree model to determine training results for characterizing a correlation between the first descriptive information and the log presenter;

and responding to the training result to reach a preset training result, and adjusting the pre-constructed decision tree model according to the test set to obtain the data prediction model.

In one possible implementation, the training module is further configured to:

Determining whether a first log presenter corresponding to a first personal feature in the training set is the same as a second log presenter corresponding to a text feature spliced with the first personal feature;

setting a first tag for the first personal feature and the text feature spliced in response to the first log presenter and the second log presenter being the same;

inputting the training set and the first label into the pre-constructed decision tree model to determine the training result.

In one possible implementation, the training module is further configured to:

setting a second tag for the concatenated first personal feature and the text feature in response to the first log presenter and the second log presenter being different;

inputting the training set and the second label into the pre-constructed decision tree model to determine the training result.

In one possible implementation manner, the second description information includes: a second keyword;

the second determination module is further configured to:

the second description information in the current vulnerability data obtained in advance is segmented to determine the second keywords;

Determining a dictionary corresponding to the first keyword between the log presenter and the first keyword according to the mapping relation, and determining whether a second keyword identical to the first keyword exists or not;

and in response to the existence of the second keyword which is the same as the first keyword, taking the log submitter corresponding to the first keyword as a candidate log submitter corresponding to the second descriptive information.

In one possible implementation, the history log data includes: a plurality of history logs; the second description information further includes: second keywords and second function text description information;

the third determination module is configured to:

the second keywords and the second function text description information are segmented to determine a plurality of second segmented words, and a plurality of second target history logs containing each second segmented word are respectively determined;

for each of the second partial words,

determining a second occurrence number of the second word in the corresponding second target history log and a second word total number of all the words in the second target history log,

determining a second word frequency of the second word relative to the history log data based on the second number of occurrences and the second word total,

Determining a total number of the history logs and a second number of the second target history logs, determining a second reverse file frequency of the second word relative to the history log data based on the total number of the history logs and the second number,

encoding the second word according to the second word frequency and the second reverse file frequency based on the preset encoding algorithm to determine a second text feature;

and determining the target text characteristic according to the second text characteristics.

In one possible implementation, the third determining module is further configured to:

and respectively splicing the target text features with the second personal features corresponding to each candidate log presenter to determine the target data set.

In one possible implementation, the fourth determining module is further configured to:

inputting the target data set into the data prediction model to determine a degree of correlation between each of the candidate log submitters and the target text feature to determine a target degree of correlation between each of the candidate log submitters and the second descriptive information;

and determining target number of target log submitters from high to low according to the target relevance.

For convenience of description, the above devices are described as being functionally divided into various modules, respectively. Of course, the functions of each module may be implemented in the same piece or pieces of software and/or hardware when implementing the present application.

The device of the foregoing embodiment is configured to implement the corresponding data processing method in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which is not described herein.

Based on the same inventive concept, the application also provides an electronic device corresponding to the method of any embodiment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor implements the data processing method of any embodiment when executing the program. Fig. 7 is a schematic diagram of a hardware structure of an electronic device according to the embodiment, where the device may include: processor 710, memory 720, input/output interface 730, communication interface 740, and bus 750. Wherein processor 710, memory 720, input/output interface 730, and communication interface 740 implement a communication connection among each other within the device via bus 750.

The processor 710 may be implemented in a general-purpose CPU (Central Processing Unit ), microprocessor, application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, etc. for executing relevant programs to implement the technical solutions provided in the embodiments of the present disclosure.

The Memory 720 may be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory ), static storage device, dynamic storage device, or the like. Memory 720 may store an operating system and other application programs, and when the technical solutions provided by the embodiments of the present specification are implemented in software or firmware, relevant program codes are stored in memory 720 and invoked for execution by processor 710.

The input/output interface 730 is used to connect with an input/output module to realize information input and output. The input/output module may be configured as a component in a device (not shown in the figure) or may be external to the device to provide corresponding functionality. Wherein the input devices may include a keyboard, mouse, touch screen, microphone, various types of sensors, etc., and the output devices may include a display, speaker, vibrator, indicator lights, etc.

The communication interface 740 is used to connect with a communication module (not shown) to enable communication interactions between the device and other devices. The communication module may implement communication through a wired manner (such as USB, network cable, etc.), or may implement communication through a wireless manner (such as mobile network, WIFI, bluetooth, etc.).

Bus 750 includes a path to transfer information between elements of the device (e.g., processor 710, memory 720, input/output interface 730, and communication interface 740).

It should be noted that although the above-described device only shows processor 710, memory 720, input/output interface 730, communication interface 740, and bus 750, in particular implementations, the device may include other components necessary to achieve proper operation. Furthermore, it will be understood by those skilled in the art that the above-described apparatus may include only the components necessary to implement the embodiments of the present description, and not all the components shown in the drawings.

The electronic device of the foregoing embodiment is configured to implement the corresponding data processing method in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which is not described herein.

Based on the same inventive concept, the present application also provides a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the data processing method according to any of the above embodiments, corresponding to the method according to any of the above embodiments.

The computer readable media of the present embodiments, including both permanent and non-permanent, removable and non-removable media, may be used to implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device.

The storage medium of the foregoing embodiments stores computer instructions for causing the computer to perform the data processing method according to any of the foregoing embodiments, and has the advantages of the corresponding method embodiments, which are not described herein.

Based on the same inventive concept, the present disclosure also provides a computer program product, corresponding to the data processing method described in any of the above embodiments, comprising computer program instructions. In some embodiments, the computer program instructions may be executed by one or more processors of a computer to cause the computer and/or the processor to perform the described data processing method. Corresponding to the execution subject corresponding to each step in each embodiment of the data processing method, the processor executing the corresponding step may belong to the corresponding execution subject.

The computer program product of the above embodiment is configured to enable the computer and/or the processor to perform the data processing method according to any one of the above embodiments, and has the advantages of corresponding method embodiments, which are not described herein again.

Those of ordinary skill in the art will appreciate that: the discussion of any of the embodiments above is merely exemplary and is not intended to suggest that the scope of the application (including the claims) is limited to these examples; the technical features of the above embodiments or in the different embodiments may also be combined within the idea of the application, the steps may be implemented in any order, and there are many other variations of the different aspects of the embodiments of the application as described above, which are not provided in detail for the sake of brevity.

Additionally, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures, in order to simplify the illustration and discussion, and so as not to obscure the embodiments of the present application. Furthermore, the devices may be shown in block diagram form in order to avoid obscuring the embodiments of the present application, and also in view of the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the embodiments of the present application are to be implemented (i.e., such specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the application, it should be apparent to one skilled in the art that embodiments of the application can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative in nature and not as restrictive.

While the application has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of those embodiments will be apparent to those skilled in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic RAM (DRAM)) may use the embodiments discussed.

The present embodiments are intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Therefore, any omissions, modifications, equivalent substitutions, improvements, and the like, which are within the spirit and principles of the embodiments of the application, are intended to be included within the scope of the application.

Claims

1. A method of data processing, the method comprising:

2. The method of claim 1, wherein the first description information comprises: the first keywords and the first function text description information;

the determining, according to the pre-acquired history log data, a mapping relationship between a log presenter and first description information submitted by the log presenter, includes:

3. The method of claim 2, wherein the constructing a dictionary of correspondence between the journal submitter and the first keyword, and determining a mapping relationship between the journal submitter and the first description information according to the dictionary, comprises:

4. The method of claim 2, wherein the history log data comprises: a plurality of history logs;

the encoding the first description information according to a preset encoding algorithm to determine text features includes:

for each of the first partial words,

the text feature is determined from a plurality of the first text features.

5. The method of claim 1, wherein the stitching based on the first personal characteristic corresponding to the journal presenter and the text characteristic to obtain a sample dataset comprises:

For each of the features of the text,

6. The method of claim 5, wherein the training process for training the pre-constructed decision tree model based on the sample dataset to obtain a data prediction model comprises:

7. The method of claim 6, wherein said inputting the training set into the pre-built decision tree model to determine training results for characterizing a correlation between the first descriptive information and the log presenter comprises:

8. The method of claim 7, wherein after determining whether a first journal presenter corresponding to a first personal feature in the training set and a second journal presenter corresponding to a text feature that is stitched with the first personal feature are the same, further comprising:

9. The method of claim 2, wherein the second description information includes: a second keyword;

The determining, according to the mapping relationship, a candidate log presenter corresponding to the second description information in the pre-acquired current vulnerability data includes:

10. The method of claim 1, wherein the history log data comprises: a plurality of history logs; the second description information further includes: second keywords and second function text description information;

the encoding the second description information according to the preset encoding algorithm to determine a target text feature includes:

For each of the second partial words,

determining a total number of the history logs and a second number of the second target history logs,

determining a second reverse file frequency of the second word relative to the history log data based on the total number of history logs and the second number,

11. The method of claim 1, wherein stitching the target text feature with a second personal feature corresponding to the candidate journal presenter to determine a target data set comprises:

12. The method of claim 1, wherein said inputting the target dataset into the data prediction model to determine a target journal presenter having a highest degree of correlation with the second text information comprises:

13. A data processing apparatus, the apparatus comprising:

14. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 12 when the program is executed by the processor.

15. A computer readable storage medium storing computer instructions for causing the computer to implement the method of any one of claims 1 to 12.

16. A computer program product comprising computer program instructions which, when run on a computer, cause the computer to perform the method of any one of claims 1 to 12.