CN113448861A - Method and device for detecting repeated forms - Google Patents

Method and device for detecting repeated forms Download PDF

Info

Publication number
CN113448861A
CN113448861A CN202110779913.8A CN202110779913A CN113448861A CN 113448861 A CN113448861 A CN 113448861A CN 202110779913 A CN202110779913 A CN 202110779913A CN 113448861 A CN113448861 A CN 113448861A
Authority
CN
China
Prior art keywords
detected
repeated
similar
history
historical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110779913.8A
Other languages
Chinese (zh)
Inventor
党娜
刘洋
李�昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank of China Ltd
Original Assignee
Bank of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bank of China Ltd filed Critical Bank of China Ltd
Priority to CN202110779913.8A priority Critical patent/CN113448861A/en
Publication of CN113448861A publication Critical patent/CN113448861A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound

Abstract

The invention discloses a method and a device for detecting a repeated form, wherein the method comprises the following steps: acquiring first characteristic data of a form to be detected; performing word segmentation on the description content of the form to be detected, and screening out a similar historical form with the similarity greater than a preset threshold value with the form to be detected from the historical form according to the word segmentation result of the form to be detected and the word segmentation result of the historical form; acquiring second characteristic data corresponding to each similar historical form, and determining whether the form to be detected is repeated with the similar historical form or not by utilizing a pre-trained form detection model according to the first characteristic data and the second characteristic data corresponding to the similar historical form aiming at each similar historical form; and deleting the form to be detected when the form to be detected is repeated with any similar historical form. The invention relates to the technical field of big data, can screen out repeated forms, avoids developers from repeatedly processing the same problem and improves the problem processing efficiency.

Description

Method and device for detecting repeated forms
Technical Field
The invention relates to the technical field of big data, in particular to a method and a device for detecting a repeated form.
Background
This section is intended to provide a background or context to the embodiments of the invention that are recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.
Before a certain application or function is put into use, the application or function needs to be tested, a tester can arrange each tested problem into a form, and then the form is submitted to a developer, and the developer examines and corrects the problems in the form. Different testers can arrange forms aiming at the same problem, repeated forms appear, developers can not easily screen the repeated forms when processing the forms, repeated processing can be carried out on the same problem, and the problem processing efficiency is influenced.
Disclosure of Invention
The embodiment of the invention provides a method for detecting a repeated form, which is used for solving the problem that when developers process forms in the prior art, repeated forms are not easy to screen and the same problem is possibly repeatedly processed, so that the problem processing efficiency is influenced, and the method comprises the following steps:
acquiring first characteristic data of a form to be detected; the first characteristic data are data corresponding to all reference characteristics of the form to be detected; the form to be detected contains description contents for describing the form;
performing word segmentation on the description content of the form to be detected, and screening out a similar historical form with the similarity greater than a preset threshold value with the form to be detected from the historical form according to the word segmentation result of the form to be detected and the word segmentation result of the historical form; the historical form is a form which is obtained in a historical mode and is not repeated in the form to be detected;
acquiring second characteristic data corresponding to each similar historical form, and determining whether the form to be detected is repeated with the similar historical form or not by utilizing a pre-trained form detection model according to the first characteristic data and the second characteristic data corresponding to the similar historical form aiming at each similar historical form; the form detection model is a model for judging whether two forms are repeated or not according to each reference characteristic through machine learning; the first characteristic data and the reference characteristic corresponding to the second characteristic data are the same;
and deleting the form to be detected when the form to be detected is repeated with any similar historical form.
The embodiment of the invention also provides a device for detecting the repeated forms, which is used for solving the problem that when developers process the forms in the prior art, the repeated forms are not easy to screen, the same problem can be repeatedly processed, and the problem processing efficiency is influenced, and the device comprises:
the acquisition module is used for acquiring first characteristic data of the form to be detected; the first characteristic data are data corresponding to all reference characteristics of the form to be detected; the form to be detected contains description contents for describing the form;
the first processing module is used for performing word segmentation processing on the description content of the form to be detected and screening out a similar historical form with the similarity larger than a preset threshold value with the form to be detected from the historical form according to the word segmentation result of the form to be detected and the word segmentation result of the historical form; the historical form is a form which is obtained in a historical mode and is not repeated in the form to be detected;
the second processing module is used for acquiring second characteristic data corresponding to each similar historical form, and determining whether the form to be detected is repeated with the similar historical form or not by utilizing a pre-trained form detection model according to the first characteristic data and the second characteristic data corresponding to the similar historical form aiming at each similar historical form; the form detection model is a model for judging whether two forms are repeated or not according to each reference characteristic through machine learning; the first characteristic data and the reference characteristic corresponding to the second characteristic data are the same;
and the deleting module is used for deleting the form to be detected when the form to be detected is repeated with any similar historical form.
The embodiment of the invention also provides computer equipment which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor realizes the detection method of the repeated form when executing the computer program.
An embodiment of the present invention further provides a computer-readable storage medium, where a computer program for executing the above method for detecting a repeated form is stored in the computer-readable storage medium.
In the embodiment of the invention, first characteristic data of a form to be detected is obtained; the first characteristic data are data corresponding to all reference characteristics of the form to be detected; the form to be detected contains description contents for describing the form; performing word segmentation on the description content of the form to be detected, and screening out a similar historical form with the similarity greater than a preset threshold value with the form to be detected from the historical form according to the word segmentation result of the form to be detected and the word segmentation result of the historical form; the historical form is a form which is obtained in a historical mode and is not repeated in the form to be detected; therefore, a part of historical forms with low similarity can be filtered out, and the pressure of a subsequent form detection model for determining whether the form to be detected and the historical forms are repeated is reduced; then second characteristic data corresponding to each similar historical form is obtained, and for each similar historical form, whether the form to be detected is repeated with the similar historical form is determined by utilizing a pre-trained form detection model according to the first characteristic data and the second characteristic data corresponding to the similar historical form; the form detection model is a model for judging whether two forms are repeated or not according to all reference characteristics of the forms through machine learning; the first characteristic data and the reference characteristic corresponding to the second characteristic data are the same; deleting the form to be detected when the form to be detected is repeated with any similar historical form; therefore, through the form detection model, whether the historical form with high similarity to the form to be detected is repeated with the form to be detected can be accurately judged, and then repeated forms are screened out, so that the problem processing efficiency is improved as developers are prevented from repeatedly processing the same problem.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts. In the drawings:
FIG. 1 is a flowchart of a method for detecting a duplicate form according to an embodiment of the present invention;
fig. 2 is a flowchart of a method for performing word segmentation processing on the description content of the to-be-detected form and screening out a similar history form with a similarity greater than a preset threshold from the history form according to the word segmentation result of the to-be-detected form and the word segmentation result of the history form, according to the embodiment of the present invention;
fig. 3 is a flowchart of another method for performing word segmentation on the description content of the to-be-detected form and screening out a similar history form with a similarity greater than a preset threshold from the history form according to the word segmentation result of the to-be-detected form and the word segmentation result of the history form, provided in the embodiment of the present invention;
FIG. 4 is a flowchart of a method for training a form detection model according to an embodiment of the present invention;
fig. 5 is an exemplary diagram of an apparatus for detecting a repeated form provided in the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention are further described in detail below with reference to the accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.
The term "and/or" herein merely describes an associative relationship, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.
In the description of the present specification, the terms "comprising," "including," "having," "containing," and the like are used in an open-ended fashion, i.e., to mean including, but not limited to. Reference to the description of the terms "one embodiment," "a particular embodiment," "some embodiments," "for example," etc., means that a particular feature, structure, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. The sequence of steps involved in the embodiments is for illustrative purposes to illustrate the implementation of the present application, and the sequence of steps is not limited and can be adjusted as needed.
Research shows that before a certain application or function is put into use, the application or function needs to be tested, a tester can sort each tested problem into a form, and then submits the form to a developer, and the developer examines and corrects the problems in the form. Different testers may sort forms against the same problem, and repeated forms appear; when a developer processes forms, the problem in each form is checked and solved one by one, when the number of the forms is large and repeated forms exist, the developer cannot accurately judge which forms are repeated, different developers process different forms, and the developer cannot judge whether the repeated forms are processed or not, so that the problem processing efficiency can be influenced by repeated checking and correction of the same problem.
In view of the above research, an embodiment of the present invention provides a method for detecting a repeated form, as shown in fig. 1, including:
s101: acquiring first characteristic data of a form to be detected; the first characteristic data are data corresponding to all reference characteristics of the form to be detected; the form to be detected contains description contents for describing the form;
s102: performing word segmentation on the description content of the form to be detected, and screening out a similar historical form with the similarity greater than a preset threshold value with the form to be detected from the historical form according to the word segmentation result of the form to be detected and the word segmentation result of the historical form; the historical form is a form which is obtained in a historical mode and is not repeated in the form to be detected;
s103: acquiring second characteristic data corresponding to each similar historical form, and determining whether the form to be detected is repeated with the similar historical form or not by utilizing a pre-trained form detection model according to the first characteristic data and the second characteristic data corresponding to the similar historical form aiming at each similar historical form; the form detection model is a model for judging whether two forms are repeated or not according to each reference characteristic through machine learning; the first characteristic data and the reference characteristic corresponding to the second characteristic data are the same;
s104: and deleting the form to be detected when the form to be detected is repeated with any similar historical form.
In the embodiment of the invention, first characteristic data of a form to be detected is obtained; the first characteristic data are data corresponding to all reference characteristics of the form to be detected; the form to be detected contains description contents for describing the form; performing word segmentation on the description content of the form to be detected, and screening out a similar historical form with the similarity greater than a preset threshold value with the form to be detected from the historical form according to the word segmentation result of the form to be detected and the word segmentation result of the historical form; the historical form is a form which is obtained in a historical mode and is not repeated in the form to be detected; therefore, a part of historical forms with low similarity can be filtered out, and the pressure of a subsequent form detection model for determining whether the form to be detected and the historical forms are repeated is reduced; then second characteristic data corresponding to each similar historical form is obtained, and for each similar historical form, whether the form to be detected is repeated with the similar historical form is determined by utilizing a pre-trained form detection model according to the first characteristic data and the second characteristic data corresponding to the similar historical form; the form detection model is a model for judging whether two forms are repeated or not according to all reference characteristics of the forms through machine learning; the first characteristic data and the reference characteristic corresponding to the second characteristic data are the same; deleting the form to be detected when the form to be detected is repeated with any similar historical form; therefore, through the form detection model, whether the historical form with high similarity to the form to be detected is repeated with the form to be detected can be accurately judged, and then repeated forms are screened out, so that the problem processing efficiency is improved as developers are prevented from repeatedly processing the same problem.
Some terms in the embodiments of the present invention are explained below:
the first characteristic data and the second characteristic data are the same in corresponding reference characteristics; the first characteristic data comprises data corresponding to all reference characteristics in a form to be detected, for example; the second feature data includes, for example, data corresponding to each reference feature in the similar history form.
The history form in the embodiment of the invention comprises, for example, a form which is obtained in history and is not repeated in a form to be detected; the similar history form comprises, for example, a history form which is determined to have high similarity with the form to be detected from the history form.
The following describes the details of S101 to S104.
For the above S101, the form to be detected includes description content describing main content of the form to be detected, and the form to be detected includes, for example: the service form of the bank, the form containing the test problem submitted by the tester in the test scene, and the like.
Taking the form to be detected as the form containing the test question submitted by the tester in the test scene as an example, the first characteristic data includes at least one of the following: the method comprises the following steps of (1) obtaining priority data of a form to be detected, severity data of a test problem in the form to be detected, the number of test cases influenced by the test problem in the form to be detected and a service identifier corresponding to the form to be detected; the corresponding second characteristic data for example comprises at least one of the following: the method comprises the following steps of obtaining priority data of similar history forms, severity data of test problems in the similar history forms, the number of test cases influenced by the test problems in the similar history forms and service identifications corresponding to the similar history forms.
For the above S102, when performing the word segmentation processing on the description content of the form to be detected, and screening out the similar history form with the similarity greater than the preset threshold from the history form according to the word segmentation result of the form to be detected and the word segmentation result of the history form, for example, the method shown in fig. 2 may be adopted, where fig. 2 is a flowchart of a method for performing the word segmentation processing on the description content of the form to be detected, and screening out the similar history form with the similarity greater than the preset threshold from the history form according to the word segmentation result of the form to be detected and the word segmentation result of the history form, provided by the embodiment of the present invention, and includes:
s201: and performing word segmentation processing on the description content of the form to be detected by using a word segmentation tool to obtain a first word segmentation corresponding to the form to be detected.
Specifically, for example, but not limited to, at least one of a uni-gram language model and a bi-gram language model may be used to perform word segmentation on the description content of the to-be-detected form, so as to obtain at least one first word segmentation corresponding to the to-be-detected form.
S202: and acquiring second participles corresponding to the history forms respectively.
Here, in the history period in which the history form is acquired and the repeated form detection is performed on the history form, the description content of the history form is subjected to the participle processing by using a method similar to the participle processing on the description content of the form to be detected, and at least one second participle corresponding to each history form is obtained, so that at least one second participle corresponding to each history form obtained in the history period can be directly acquired.
S203: for each history form: comparing the first participles of the form to be detected with the second participles of the historical form respectively, and determining the number of the second participles which are consistent with the first participles in the historical form; and when the proportion of the number of the second participles consistent with the first participles in the history form to the total number of the second participles in the history form is greater than a preset threshold value, determining that the history form is a similar history form.
For example, if the preset threshold is seventy-five percent, the history list is a similar history list if the number of the second participles in the history list, which are consistent with the first participles, is more than seventy-five percent of the total number of the second participles in the history list.
In another embodiment of the present invention, when performing word segmentation processing on the description content of the to-be-detected form, and screening out a similar history form with a similarity greater than a preset threshold from the history form according to the word segmentation result of the to-be-detected form and the word segmentation result of the history form, for example, the method shown in fig. 3 may also be used, where fig. 3 is another flow chart of a method provided in an embodiment of the present invention, for performing word segmentation processing on the description content of the to-be-detected form, and screening out a similar history form with a similarity greater than a preset threshold from the history form according to the word segmentation result of the to-be-detected form and the word segmentation result of the history form, and the flow chart includes:
s301: and performing word segmentation processing on the description content of the form to be detected and the description content of the historical form, and obtaining the similarity of the form to be detected and each historical form respectively corresponding to the word segmentation result of the form to be detected and the word segmentation result of the historical form.
Specifically, for example, but not limited to, the following method (1) to (3) may be adopted to perform the word segmentation processing on the description content of the to-be-detected form and the description content of the history form:
(1): and performing word segmentation processing on the description content of the form to be detected and the description content of the historical form by using a pre-trained word segmentation model to obtain the similarity of the form to be detected and each historical form respectively corresponding to each other.
The word segmentation model includes, for example: a Markov Model (HMM), a Structured Perceptron (SP), a Conditional Random Field (CRF), etc.
(2) And performing word segmentation processing on the description content of the form to be detected and the description content of the historical form by using a word segmentation method based on character string matching to obtain the similarity corresponding to each historical form.
The word segmentation method based on character string matching comprises the following steps: forward maximum match, reverse maximum match, and bidirectional maximum match.
Taking the forward maximum match as an example: aiming at a historical form, taking out a preset number of first characters from left to right from the description content of the form to be detected; then, taking out a preset number of second characters from left to right from the first character of the description content of the history form; comparing the first character to the second character; if the comparison result is inconsistent, reselecting a second character from the second character of the description content of the historical form to compare the second character with the first character until the second character is consistent with the first character or the second character cannot be reselected again in the historical form, selecting a new first character from the next character of the last character of the first character selected last time in the description content of the form to be detected, and starting new comparison with the second character in the historical form until the new first character cannot be reselected again in the form to be detected, and stopping forward maximum matching; and calculating the matching number of the first characters and the second characters and the comparison times to obtain the similarity between the form to be detected and the historical form.
(3) And performing word segmentation processing on the description content of the form to be detected and the description content of the historical form by using a word segmentation method based on understanding to obtain the similarity of the form to be detected and each historical form respectively corresponding to each other.
The word segmentation method based on understanding includes: and (4) carrying out syntactic and semantic analysis while segmenting words, and processing singular phenomena by using syntactic information and semantic information.
S302: and determining the history form with the similarity between the form to be detected and the history form larger than a preset threshold value as a similar history form.
Therefore, a part of historical forms with low similarity can be filtered out, the pressure of a subsequent form detection model for determining whether the form to be detected and the historical forms are repeated is reduced, and the efficiency of repeated form detection is improved.
For the above S103, the form detection model is a model obtained by machine learning according to each reference feature to determine whether two forms are repeated, taking the form to be detected as a form containing a test problem submitted by a tester in a test scene as an example, as shown in fig. 4, a flowchart of a method for training the form detection model provided by the embodiment of the present invention includes:
s401: and acquiring a form submitted by the tester in history, and extracting a plurality of reference features from the form submitted by the tester in history.
Wherein the reference feature for example comprises at least one of: the priority of the form, the severity of the test problem in the form, the test case affected by the test problem in the form, and the service identifier corresponding to the form to be detected.
S402: and extracting feature data corresponding to each reference feature in the form submitted by each tester in history.
S403: and obtaining a training sample according to the characteristic data corresponding to the form submitted by each tester in history.
Specifically, for example, every two repeated forms in the forms submitted by each tester are used as a positive sample, and the feature data of the two repeated forms in the positive sample is used as the positive sample data of the positive sample; taking every two non-repeated forms in the forms submitted by each tester as a negative sample, and taking the characteristic data of the two non-repeated forms in the negative sample as the negative sample data of the negative sample; and obtaining a training sample according to the positive sample, the positive sample data of the positive sample, the negative sample and the negative sample data of the negative sample.
S404: and constructing a plurality of decision trees according to the plurality of reference characteristics.
Specifically, for example, for each decision tree, at least one reference feature is selected from the multiple reference features, and a corresponding node is generated according to each reference feature to form the decision tree.
S405: and training the plurality of decision trees by using the training samples, and selecting a target decision tree of which the detection result accords with the expectation from the plurality of decision trees.
Specifically, for example, a training sample is used for supervised training, for example, positive sample data of a positive sample is input into a decision tree, an output result of the decision tree is repetitive, negative sample data of a negative sample is input into the decision tree, and an output result of the decision tree is not repetitive, so that the decision tree is a decision tree of which a detection result accords with expectation; and carrying out supervised training on each decision tree by using a training sample, and selecting a target decision tree with a detection result meeting the expectation from all decision trees.
S406: and pruning the target decision tree according to the preset recursion depth to obtain a form detection model.
When the above S103 is performed, for each similar history form, according to the first feature data and the second feature data corresponding to the similar history form, a form detection model trained in advance is used to determine whether the form to be detected and the similar history form are repeated, for example: aiming at each similar historical form, determining whether the form to be detected is repeated with the similar historical form or not by utilizing each decision tree in the form detection model according to the first characteristic data and the second characteristic data corresponding to the similar historical form; and determining whether the form to be detected is repeated with the similar historical form or not according to the number of the first decision trees which are used for determining that the form to be detected is repeated with the similar historical form in the form detection model and the number of the second decision trees which are used for determining that the form to be detected is not repeated with the similar historical form in the form detection model.
Specifically, when determining whether the form to be detected and the similar history form are repeated according to the number of first decision trees for determining that the form to be detected and the similar history form are repeated in the form detection model and the number of second decision trees for determining that the form to be detected and the similar history form are not repeated in the form detection model, for example, but not limited to, any one of the following methods a to B may be used:
a: when the number of the first decision trees is larger than that of the second decision trees, determining that the form to be detected is repeated with the similar historical form; and when the number of the first decision trees is less than or equal to that of the second decision trees, determining that the form to be detected is not repeated with the similar historical form.
B: setting a weight parameter for each decision tree according to the recursion depth of each decision tree in the form detection model; multiplying the number of the first decision trees of each recursion depth by the weight parameter corresponding to the first decision tree of the recursion depth to obtain a first calculation result of the recursion depth; adding the first calculation results of all recursion depths to obtain a repeated predicted value of the form to be detected and the similar historical form; multiplying the number of the second decision trees of each recursion depth by the weight parameter corresponding to the second decision tree of the recursion depth to obtain a second calculation result of the recursion depth; adding the second calculation results of the recursion depths to obtain a predicted value of the form to be detected, which is not repeated with the similar history form, and when the repeated predicted value of the form to be detected and the similar history form is greater than the non-repeated predicted value of the form to be detected and the similar history form, repeating the form to be detected and the similar history form; and when the repeated predicted value of the form to be detected and the similar history form is less than or equal to the non-repeated predicted value of the form to be detected and the similar history form, the form to be detected and the similar history form are not repeated.
For the above S104, when the form to be detected is repeated with any similar history form, the form to be detected is represented as a repeated form, and the form to be detected is deleted in order to avoid the presence of the repeated form. Taking the form to be detected as the form containing the test problem submitted by the tester in the test scene as an example, the acquisition of repeated forms can be avoided, so that the repeated processing of the same test problem by developers is avoided, and the problem processing efficiency is improved.
In addition, in another embodiment of the invention, when the form to be detected and all similar historical forms are not repeated, the test problems contained in the form to be detected are checked and corrected, and the form to be detected is stored in the database for storing the historical forms.
The embodiment of the invention also provides a device for detecting the repeated forms, which is described in the following embodiment. Because the principle of the device for solving the problems is similar to the detection method of the repeated form, the implementation of the device can refer to the implementation of the detection method of the repeated form, and repeated parts are not described again.
As shown in fig. 5, an exemplary diagram of an apparatus for detecting a repeated form according to an embodiment of the present invention includes: an acquisition module 501, a first processing module 502, a second processing module 503, and a deletion module 504; wherein the content of the first and second substances,
an obtaining module 501, configured to obtain first feature data of a form to be detected; the first characteristic data are data corresponding to all reference characteristics of the form to be detected; the form to be detected contains description contents for describing the form;
the first processing module 502 is used for performing word segmentation processing on the description content of the form to be detected, and screening out a similar history form with the similarity greater than a preset threshold value with the form to be detected from the history form according to the word segmentation result of the form to be detected and the word segmentation result of the history form; the historical form is a form which is obtained in a historical mode and is not repeated in the form to be detected;
the second processing module 503 is configured to obtain second feature data corresponding to each similar history form, and determine, for each similar history form, whether the form to be detected is repeated with the similar history form by using a pre-trained form detection model according to the first feature data and the second feature data corresponding to the similar history form; the form detection model is a model for judging whether two forms are repeated or not according to each reference characteristic through machine learning; the first characteristic data and the reference characteristic corresponding to the second characteristic data are the same;
and the deleting module 504 is used for deleting the form to be detected when the form to be detected is repeated with any similar history form.
In one possible embodiment, the first characteristic data comprises at least one of: the method comprises the following steps of (1) obtaining priority data of a form to be detected, severity data of a test problem in the form to be detected, the number of test cases influenced by the test problem in the form to be detected and a service identifier corresponding to the form to be detected; the second characteristic data includes at least one of: the method comprises the following steps of obtaining priority data of similar history forms, severity data of test problems in the similar history forms, the number of test cases influenced by the test problems in the similar history forms and service identifications corresponding to the similar history forms.
In a possible implementation manner, the first processing module is specifically configured to perform word segmentation processing on the description content of the to-be-detected form by using a word segmentation tool to obtain a first word segmentation corresponding to the to-be-detected form; acquiring second participles corresponding to the historical forms respectively; for each history form: comparing the first participles of the form to be detected with the second participles of the historical form respectively, and determining the number of the second participles which are consistent with the first participles in the historical form; and when the proportion of the number of the second participles consistent with the first participles in the history form to the total number of the second participles in the history form is greater than a preset threshold value, determining that the history form is a similar history form.
In one possible embodiment, the method further comprises: the third processing module is used for acquiring a form submitted by a tester in history and extracting a plurality of reference features from the form submitted by the tester in history; extracting feature data corresponding to each reference feature in a form submitted by each tester in history; obtaining a training sample according to characteristic data corresponding to a form submitted by each tester in history; constructing a plurality of decision trees according to the plurality of reference characteristics; training a plurality of decision trees by using a training sample, and selecting a target decision tree of which the detection result accords with expectation from the plurality of decision trees; and pruning the target decision tree according to the preset recursion depth to obtain a form detection model.
In a possible implementation manner, the third processing module is specifically configured to use every two repeated forms in the forms submitted by each tester as a positive sample, and use the feature data of the two repeated forms in the positive sample as the positive sample data of the positive sample;
taking every two non-repeated forms in the forms submitted by each tester as a negative sample, and taking the characteristic data of the two non-repeated forms in the negative sample as the negative sample data of the negative sample;
and obtaining a training sample according to the positive sample, the positive sample data of the positive sample, the negative sample and the negative sample data of the negative sample.
In a possible implementation manner, the second processing module is specifically configured to determine, for each similar history form, whether the form to be detected and the similar history form are repeated by using each decision tree in the form detection model according to the first feature data and the second feature data corresponding to the similar history form; and determining whether the form to be detected is repeated with the similar historical form or not according to the number of the first decision trees which are used for determining that the form to be detected is repeated with the similar historical form in the form detection model and the number of the second decision trees which are used for determining that the form to be detected is not repeated with the similar historical form in the form detection model.
In a possible implementation manner, the second processing module is specifically configured to determine that the form to be detected is repeated with the similar history form when the number of the first decision trees is greater than the number of the second decision trees; and when the number of the first decision trees is less than or equal to that of the second decision trees, determining that the form to be detected is not repeated with the similar historical form.
In a possible implementation manner, the third processing module is further configured to set a weight parameter for each decision tree according to a recursion depth of each decision tree in the form detection model; the second processing module is specifically used for multiplying the number of the first decision trees of each recursion depth by the weight parameter corresponding to the first decision tree of the recursion depth to obtain a first calculation result of the recursion depth; adding the first calculation results of all recursion depths to obtain a repeated predicted value of the form to be detected and the similar historical form; multiplying the number of the second decision trees of each recursion depth by the weight parameter corresponding to the second decision tree of the recursion depth to obtain a second calculation result of the recursion depth; adding the second calculation results of all recursion depths to obtain a predicted value of the form to be detected, which is not repeated with the similar historical form; when the repeated predicted value of the form to be detected and the similar history form is larger than the non-repeated predicted value of the form to be detected and the similar history form, the form to be detected and the similar history form are repeated; and when the repeated predicted value of the form to be detected and the similar history form is less than or equal to the non-repeated predicted value of the form to be detected and the similar history form, the form to be detected and the similar history form are not repeated.
In one possible embodiment, the method further comprises: and the fourth processing module is used for checking and correcting the test problems contained in the form to be detected when the form to be detected is not repeated with all similar historical forms, and storing the form to be detected in a database for storing the historical forms.
The embodiment of the invention also provides computer equipment which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor realizes the detection method of the repeated form when executing the computer program.
An embodiment of the present invention further provides a computer-readable storage medium, where a computer program for executing the above method for detecting a repeated form is stored in the computer-readable storage medium.
In the embodiment of the invention, first characteristic data of a form to be detected is obtained; the first characteristic data are data corresponding to all reference characteristics of the form to be detected; the form to be detected contains description contents for describing the form; performing word segmentation on the description content of the form to be detected, and screening out a similar historical form with the similarity greater than a preset threshold value with the form to be detected from the historical form according to the word segmentation result of the form to be detected and the word segmentation result of the historical form; the historical form is a form which is obtained in a historical mode and is not repeated in the form to be detected; therefore, a part of historical forms with low similarity can be filtered out, and the pressure of a subsequent form detection model for determining whether the form to be detected and the historical forms are repeated is reduced; then second characteristic data corresponding to each similar historical form is obtained, and for each similar historical form, whether the form to be detected is repeated with the similar historical form is determined by utilizing a pre-trained form detection model according to the first characteristic data and the second characteristic data corresponding to the similar historical form; the form detection model is a model for judging whether two forms are repeated or not according to all reference characteristics of the forms through machine learning; the first characteristic data and the reference characteristic corresponding to the second characteristic data are the same; deleting the form to be detected when the form to be detected is repeated with any similar historical form; therefore, through the form detection model, whether the historical form with high similarity to the form to be detected is repeated with the form to be detected can be accurately judged, and then repeated forms are screened out, so that the problem processing efficiency is improved as developers are prevented from repeatedly processing the same problem.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (20)

1. A method for detecting a duplicate form, comprising:
acquiring first characteristic data of a form to be detected; the first characteristic data are data corresponding to all reference characteristics of the form to be detected; the form to be detected contains description contents for describing the form;
performing word segmentation on the description content of the form to be detected, and screening out a similar historical form with the similarity greater than a preset threshold value with the form to be detected from the historical form according to the word segmentation result of the form to be detected and the word segmentation result of the historical form; the historical form is a form which is obtained in a historical mode and is not repeated in the form to be detected;
acquiring second characteristic data corresponding to each similar historical form, and determining whether the form to be detected is repeated with the similar historical form or not by utilizing a pre-trained form detection model according to the first characteristic data and the second characteristic data corresponding to the similar historical form aiming at each similar historical form; the form detection model is a model for judging whether two forms are repeated or not according to each reference characteristic through machine learning; the first characteristic data and the reference characteristic corresponding to the second characteristic data are the same;
and deleting the form to be detected when the form to be detected is repeated with any similar historical form.
2. The detection method according to claim 1, wherein the first characteristic data includes at least one of:
the method comprises the following steps of (1) obtaining priority data of a form to be detected, severity data of a test problem in the form to be detected, the number of test cases influenced by the test problem in the form to be detected and a service identifier corresponding to the form to be detected;
the second characteristic data includes at least one of:
the method comprises the following steps of obtaining priority data of similar history forms, severity data of test problems in the similar history forms, the number of test cases influenced by the test problems in the similar history forms and service identifications corresponding to the similar history forms.
3. The detection method according to claim 1, wherein performing word segmentation on the description content of the form to be detected, and screening out a similar history form having a similarity greater than a preset threshold with the form to be detected from the history form according to the word segmentation result of the form to be detected and the word segmentation result of the history form, comprises:
utilizing a word segmentation tool to perform word segmentation processing on the description content of the form to be detected to obtain a first word segmentation corresponding to the form to be detected;
acquiring second participles corresponding to the historical forms respectively;
for each history form:
comparing the first participles of the form to be detected with the second participles of the historical form respectively, and determining the number of the second participles which are consistent with the first participles in the historical form; and when the proportion of the number of the second participles consistent with the first participles in the history form to the total number of the second participles in the history form is greater than a preset threshold value, determining that the history form is a similar history form.
4. The detection method according to claim 1, further comprising:
obtaining a form submitted by a tester in history, and extracting a plurality of reference features from the form submitted by the tester in history;
extracting feature data corresponding to each reference feature in a form submitted by each tester in history;
obtaining a training sample according to characteristic data corresponding to a form submitted by each tester in history;
constructing a plurality of decision trees according to the plurality of reference characteristics;
training a plurality of decision trees by using a training sample, and selecting a target decision tree of which the detection result accords with expectation from the plurality of decision trees;
and pruning the target decision tree according to the preset recursion depth to obtain a form detection model.
5. The detection method according to claim 4, wherein obtaining the training sample according to the feature data corresponding to the form submitted by each tester in history comprises:
taking every two repeated forms in the forms submitted by each tester as a positive sample, and taking the characteristic data of the two repeated forms in the positive sample as the positive sample data of the positive sample;
taking every two non-repeated forms in the forms submitted by each tester as a negative sample, and taking the characteristic data of the two non-repeated forms in the negative sample as the negative sample data of the negative sample;
and obtaining a training sample according to the positive sample, the positive sample data of the positive sample, the negative sample and the negative sample data of the negative sample.
6. The detection method according to claim 4 or 5, wherein for each similar history form, determining whether the form to be detected and the similar history form are repeated by using a pre-trained form detection model according to the first feature data and the second feature data corresponding to the similar history form comprises:
aiming at each similar historical form, determining whether the form to be detected is repeated with the similar historical form or not by utilizing each decision tree in the form detection model according to the first characteristic data and the second characteristic data corresponding to the similar historical form;
and determining whether the form to be detected is repeated with the similar historical form or not according to the number of the first decision trees which are used for determining that the form to be detected is repeated with the similar historical form in the form detection model and the number of the second decision trees which are used for determining that the form to be detected is not repeated with the similar historical form in the form detection model.
7. The method of claim 6, wherein determining whether the form to be detected is duplicated with the similar historical form based on a number of first decision trees in the form detection model that determine that the form to be detected is duplicated with the similar historical form and a number of second decision trees in the form detection model that determine that the form to be detected is not duplicated with the similar historical form comprises:
when the number of the first decision trees is larger than that of the second decision trees, determining that the form to be detected is repeated with the similar historical form;
and when the number of the first decision trees is less than or equal to that of the second decision trees, determining that the form to be detected is not repeated with the similar historical form.
8. The detection method according to claim 6, further comprising:
setting a weight parameter for each decision tree according to the recursion depth of each decision tree in the form detection model;
determining whether the form to be detected is repeated with the similar historical form according to the number of first decision trees for determining that the form to be detected is repeated with the similar historical form in the form detection model and the number of second decision trees for determining that the form to be detected is not repeated with the similar historical form in the form detection model, wherein the determining comprises the following steps:
multiplying the number of the first decision trees of each recursion depth by the weight parameter corresponding to the first decision tree of the recursion depth to obtain a first calculation result of the recursion depth; adding the first calculation results of all recursion depths to obtain a repeated predicted value of the form to be detected and the similar historical form;
multiplying the number of the second decision trees of each recursion depth by the weight parameter corresponding to the second decision tree of the recursion depth to obtain a second calculation result of the recursion depth; adding the second calculation results of all recursion depths to obtain a predicted value of the form to be detected, which is not repeated with the similar historical form;
when the repeated predicted value of the form to be detected and the similar history form is larger than the non-repeated predicted value of the form to be detected and the similar history form, the form to be detected and the similar history form are repeated;
and when the repeated predicted value of the form to be detected and the similar history form is less than or equal to the non-repeated predicted value of the form to be detected and the similar history form, the form to be detected and the similar history form are not repeated.
9. The detection method according to claim 1, further comprising:
when the form to be detected is not repeated with all similar historical forms, the test problems contained in the form to be detected are checked and corrected, and the form to be detected is stored in a database for storing the historical forms.
10. An apparatus for detecting a duplicate form, comprising:
the acquisition module is used for acquiring first characteristic data of the form to be detected; the first characteristic data are data corresponding to all reference characteristics of the form to be detected; the form to be detected contains description contents for describing the form;
the first processing module is used for performing word segmentation processing on the description content of the form to be detected and screening out a similar historical form with the similarity larger than a preset threshold value with the form to be detected from the historical form according to the word segmentation result of the form to be detected and the word segmentation result of the historical form; the historical form is a form which is obtained in a historical mode and is not repeated in the form to be detected;
the second processing module is used for acquiring second characteristic data corresponding to each similar historical form, and determining whether the form to be detected is repeated with the similar historical form or not by utilizing a pre-trained form detection model according to the first characteristic data and the second characteristic data corresponding to the similar historical form aiming at each similar historical form; the form detection model is a model for judging whether two forms are repeated or not according to each reference characteristic through machine learning; the first characteristic data and the reference characteristic corresponding to the second characteristic data are the same;
and the deleting module is used for deleting the form to be detected when the form to be detected is repeated with any similar historical form.
11. The sensing device of claim 10, wherein the first characterization data includes at least one of:
the method comprises the following steps of (1) obtaining priority data of a form to be detected, severity data of a test problem in the form to be detected, the number of test cases influenced by the test problem in the form to be detected and a service identifier corresponding to the form to be detected;
the second characteristic data includes at least one of:
the method comprises the following steps of obtaining priority data of similar history forms, severity data of test problems in the similar history forms, the number of test cases influenced by the test problems in the similar history forms and service identifications corresponding to the similar history forms.
12. The detection apparatus according to claim 10, wherein the first processing module is specifically configured to perform word segmentation processing on the description content of the to-be-detected form by using a word segmentation tool, so as to obtain a first word segmentation corresponding to the to-be-detected form;
acquiring second participles corresponding to the historical forms respectively;
for each history form:
comparing the first participles of the form to be detected with the second participles of the historical form respectively, and determining the number of the second participles which are consistent with the first participles in the historical form; and when the proportion of the number of the second participles consistent with the first participles in the history form to the total number of the second participles in the history form is greater than a preset threshold value, determining that the history form is a similar history form.
13. The detection device of claim 10, further comprising:
the third processing module is used for acquiring a form submitted by a tester in history and extracting a plurality of reference features from the form submitted by the tester in history;
extracting feature data corresponding to each reference feature in a form submitted by each tester in history;
obtaining a training sample according to characteristic data corresponding to a form submitted by each tester in history;
constructing a plurality of decision trees according to the plurality of reference characteristics;
training a plurality of decision trees by using a training sample, and selecting a target decision tree of which the detection result accords with expectation from the plurality of decision trees;
and pruning the target decision tree according to the preset recursion depth to obtain a form detection model.
14. The detection apparatus according to claim 13, wherein the third processing module is specifically configured to use every two duplicate forms in the forms submitted by each tester as a positive sample, and use the feature data of the two duplicate forms in the positive sample as the positive sample data of the positive sample;
taking every two non-repeated forms in the forms submitted by each tester as a negative sample, and taking the characteristic data of the two non-repeated forms in the negative sample as the negative sample data of the negative sample;
and obtaining a training sample according to the positive sample, the positive sample data of the positive sample, the negative sample and the negative sample data of the negative sample.
15. The detection apparatus according to claim 13 or 14, wherein the second processing module is specifically configured to determine, for each similar history form, whether the form to be detected and the similar history form are repeated by using each decision tree in the form detection model according to the first feature data and the second feature data corresponding to the similar history form;
and determining whether the form to be detected is repeated with the similar historical form or not according to the number of the first decision trees which are used for determining that the form to be detected is repeated with the similar historical form in the form detection model and the number of the second decision trees which are used for determining that the form to be detected is not repeated with the similar historical form in the form detection model.
16. The detection apparatus according to claim 15, wherein the second processing module is specifically configured to determine that the form to be detected is repeated with the similar history form when the number of the first decision trees is greater than the number of the second decision trees;
and when the number of the first decision trees is less than or equal to that of the second decision trees, determining that the form to be detected is not repeated with the similar historical form.
17. The detection apparatus according to claim 15, wherein the third processing module is further configured to set a weight parameter for each decision tree in the form detection model according to a recursion depth of each decision tree;
the second processing module is specifically used for multiplying the number of the first decision trees of each recursion depth by the weight parameter corresponding to the first decision tree of the recursion depth to obtain a first calculation result of the recursion depth; adding the first calculation results of all recursion depths to obtain a repeated predicted value of the form to be detected and the similar historical form;
multiplying the number of the second decision trees of each recursion depth by the weight parameter corresponding to the second decision tree of the recursion depth to obtain a second calculation result of the recursion depth; adding the second calculation results of all recursion depths to obtain a predicted value of the form to be detected, which is not repeated with the similar historical form;
when the repeated predicted value of the form to be detected and the similar history form is larger than the non-repeated predicted value of the form to be detected and the similar history form, the form to be detected and the similar history form are repeated;
and when the repeated predicted value of the form to be detected and the similar history form is less than or equal to the non-repeated predicted value of the form to be detected and the similar history form, the form to be detected and the similar history form are not repeated.
18. The detection device of claim 10, further comprising:
and the fourth processing module is used for checking and correcting the test problems contained in the form to be detected when the form to be detected is not repeated with all similar historical forms, and storing the form to be detected in a database for storing the historical forms.
19. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of detecting a duplicate form of any of claims 1 to 9 when executing the computer program.
20. A computer-readable storage medium storing a computer program for executing the method for detecting a repetitive form according to any one of claims 1 to 9.
CN202110779913.8A 2021-07-09 2021-07-09 Method and device for detecting repeated forms Pending CN113448861A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110779913.8A CN113448861A (en) 2021-07-09 2021-07-09 Method and device for detecting repeated forms

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110779913.8A CN113448861A (en) 2021-07-09 2021-07-09 Method and device for detecting repeated forms

Publications (1)

Publication Number Publication Date
CN113448861A true CN113448861A (en) 2021-09-28

Family

ID=77815743

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110779913.8A Pending CN113448861A (en) 2021-07-09 2021-07-09 Method and device for detecting repeated forms

Country Status (1)

Country Link
CN (1) CN113448861A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106803096A (en) * 2016-12-27 2017-06-06 上海大汉三通通信股份有限公司 A kind of short message type recognition methods, system and short message managing platform
US20180260719A1 (en) * 2017-03-10 2018-09-13 Microsoft Technology Licensing, Llc Cascaded random decision trees using clusters
CN111552767A (en) * 2019-02-11 2020-08-18 阿里巴巴集团控股有限公司 Search method, search device and computer equipment
CN112163409A (en) * 2020-09-23 2021-01-01 平安直通咨询有限公司上海分公司 Similar document detection method, system, terminal device and computer readable storage medium
US20210133558A1 (en) * 2019-10-31 2021-05-06 International Business Machines Corporation Deep-learning model creation recommendations

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106803096A (en) * 2016-12-27 2017-06-06 上海大汉三通通信股份有限公司 A kind of short message type recognition methods, system and short message managing platform
US20180260719A1 (en) * 2017-03-10 2018-09-13 Microsoft Technology Licensing, Llc Cascaded random decision trees using clusters
CN111552767A (en) * 2019-02-11 2020-08-18 阿里巴巴集团控股有限公司 Search method, search device and computer equipment
US20210133558A1 (en) * 2019-10-31 2021-05-06 International Business Machines Corporation Deep-learning model creation recommendations
CN112163409A (en) * 2020-09-23 2021-01-01 平安直通咨询有限公司上海分公司 Similar document detection method, system, terminal device and computer readable storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
吴善鹏;李萍;: "一种基于核心词相似度的重复数据检测框架构建", 信息系统工程, no. 05, 20 May 2020 (2020-05-20) *

Similar Documents

Publication Publication Date Title
CN111309912B (en) Text classification method, apparatus, computer device and storage medium
CN108154198B (en) Knowledge base entity normalization method, system, terminal and computer readable storage medium
CN112866292B (en) Attack behavior prediction method and device for multi-sample combination attack
CN111338692A (en) Vulnerability classification method and device based on vulnerability codes and electronic equipment
CN114936158A (en) Software defect positioning method based on graph convolution neural network
CN114547318A (en) Fault information acquisition method, device, equipment and computer storage medium
CN113722719A (en) Information generation method and artificial intelligence system for security interception big data analysis
CN110825642B (en) Software code line-level defect detection method based on deep learning
CN113434685A (en) Information classification processing method and system
CN114443331A (en) Time series data abnormity detection method and device
CN116756041A (en) Code defect prediction and positioning method and device, storage medium and computer equipment
CN108875810B (en) Method and device for sampling negative examples from word frequency table aiming at training corpus
CN111767546B (en) Deep learning-based input structure inference method and device
CN116167336B (en) Sensor data processing method based on cloud computing, cloud server and medium
CN110808947B (en) Automatic vulnerability quantitative evaluation method and system
CN111901330A (en) Ensemble learning model construction method, ensemble learning model identification device, server and medium
CN113448861A (en) Method and device for detecting repeated forms
CN114707507B (en) List information detection method and device based on artificial intelligence algorithm
CN116467171A (en) Automatic test case construction device, method, electronic equipment and storage medium
CN112395280B (en) Data quality detection method and system
CN112822220B (en) Multi-sample combination attack-oriented tracing method and device
CN114861858A (en) Method, device and equipment for detecting road surface abnormal data and readable storage medium
CN114610576A (en) Log generation monitoring method and device
CN109739950B (en) Method and device for screening applicable legal provision
CN112183622A (en) Method, device, equipment and medium for detecting cheating in mobile application bots installation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination