CN117390185A - Defect judging method, device and equipment for mass-measurement product and storage medium - Google Patents

Defect judging method, device and equipment for mass-measurement product and storage medium Download PDF

Info

Publication number
CN117390185A
CN117390185A CN202311290554.5A CN202311290554A CN117390185A CN 117390185 A CN117390185 A CN 117390185A CN 202311290554 A CN202311290554 A CN 202311290554A CN 117390185 A CN117390185 A CN 117390185A
Authority
CN
China
Prior art keywords
defect
defect description
determining
word
mass
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311290554.5A
Other languages
Chinese (zh)
Inventor
王笑笑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Merchants Bank Co Ltd
Original Assignee
China Merchants Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Merchants Bank Co Ltd filed Critical China Merchants Bank Co Ltd
Priority to CN202311290554.5A priority Critical patent/CN117390185A/en
Publication of CN117390185A publication Critical patent/CN117390185A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Abstract

The application discloses a defect judging method, device and equipment for mass-measurement products and a storage medium, and belongs to the technical field of computers. According to the method and the device, the plurality of defect description texts aiming at the crowd-tested product are obtained, the types of the plurality of defect description texts, namely the types of the defect description texts and the defects of the crowd-tested product can be obtained by determining the core word set of each defect description text and performing text clustering processing on the core word set, and a manager does not need to conduct one-to-one checking on the disordered defect description texts, but performs defect summarization on the crowd-tested product based on the types of the plurality of defect description texts obtained through clustering processing, so that the defect judging efficiency of the crowd-tested product can be improved.

Description

Defect judging method, device and equipment for mass-measurement product and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a storage medium for determining defects of a mass-measurement product.
Background
Under the general condition, enterprises upload products to be tested to a mass-testing platform, mass-testing personnel can test the products to be tested within a certain time, and upload tested defect description texts to the mass-testing platform, and management personnel collect the defect description texts and then summarize the defects of the products.
However, in the process, the repetition rate of the defects of the products found by the mass-measurement personnel is high, the submitted defect description text formats are also various, the summarization difficulty of the management personnel on the defects of the products is increased, and the defect judging efficiency of the mass-measurement products is reduced.
Disclosure of Invention
The main objective of the present application is to provide a method, an apparatus, a device and a storage medium for determining defects of a mass-measurement product, which aim to solve the technical problem of defect determination efficiency of the mass-measurement product.
In order to achieve the above object, the present application provides a defect determining method for a mass-measured product, the defect determining method for the mass-measured product includes the following steps:
acquiring a plurality of defect description texts aiming at a mass-measured product;
determining a core word set of each defect description text;
and performing text clustering processing on the core word set to obtain the types of the defect description texts so that a manager can judge the defects of the mass-measured products based on the types of the defect description texts.
Optionally, the step of determining the core word set of each defect description text includes:
performing word segmentation processing on each defect description text to obtain word segmentation processing results;
based on the word segmentation processing result, determining the theme feature words and the high-frequency words of each defect description text;
and determining a core word set of each defect description text based on the topic feature words and the high-frequency words corresponding to each defect description text.
Optionally, after the step of determining the core word sets of the plurality of defect description texts based on the topic feature words and the high-frequency words corresponding to the defect description texts, the method further includes:
determining the similarity between the corresponding core words in the core word set and word segmentation words in the corresponding word segmentation processing result;
the word segmentation vocabulary with the similarity larger than a preset similarity threshold value is used as an expansion word;
and adding the expansion word to the corresponding core word set.
Optionally, the step of determining the theme feature words and the high-frequency words of each defect description text based on the word segmentation processing result includes:
based on the word segmentation processing result, determining the theme feature words of each defect description text through an LDA model;
calculating the occurrence frequency of word segmentation vocabulary corresponding to each defect description text in each word segmentation processing result;
and taking the vocabulary with the occurrence frequency larger than the preset frequency threshold value as the high-frequency word.
Optionally, before the step of determining the theme feature words and the high-frequency words of each of the defect description texts based on the word segmentation processing result, the method further includes:
cleaning the word segmentation vocabulary in the word segmentation processing result to obtain a cleaned word segmentation processing result;
and performing part-of-speech filtering processing on the cleaned word segmentation processing result to obtain a filtered word segmentation processing result, so as to determine the theme feature words and the high-frequency words of each defect description text based on the filtered word segmentation processing result, wherein the part-of-speech filtering processing comprises noun, verb and adjective preservation.
Optionally, before the step of performing text clustering processing on the core word set to obtain a text clustering result, the method further includes:
calculating mutual information and left and right information entropy among core words in the core word set;
screening composite words based on the mutual information and left and right information entropy;
and determining a new core word set based on the composite word and the core words which are not combined into the composite word.
Optionally, after the step of performing text clustering processing on the core word set to obtain the types of the defect description texts, the method further includes at least one of the following:
determining a defect description text set corresponding to each type of defect description text, determining the duty ratio of each type of defect description text based on the number of the defect description texts in each defect description text set and the total number of the plurality of defect description texts, and determining the serious defects of a mass-measurement product based on the duty ratio of each type of defect description text;
and acquiring uploading time of each defect description text in the defect description text set, judging the frequency of the presentation of the defect description text in each defect description text set in a preset period based on the uploading time, and determining whether to send an early warning to a manager based on the frequency of the presentation.
In addition, in order to achieve the above object, the present application further provides a defect determining device for a mass-measurement product, the defect determining device for a mass-measurement product includes:
the text acquisition module is used for acquiring a plurality of defect description texts aiming at the mass-measured products;
the core word set determining module is used for determining the core word set of each defect description text;
and the text clustering module is used for carrying out text clustering processing on the core word set to obtain the types of the defect description texts so as to enable a manager to determine defects of the mass-measurement product based on the types of the defect description texts.
In addition, to achieve the above object, the present application further provides an apparatus, including: the defect determination system comprises a memory, a processor and a defect determination program of a mass-measured product stored on the memory and capable of running on the processor, wherein the defect determination program of the mass-measured product is configured to realize the steps of the defect determination method of the mass-measured product.
In addition, in order to achieve the above object, the present application further provides a computer-readable storage medium having stored thereon a defect determination program of a mass-measured product, which when executed by a processor, implements the steps of the defect determination method of the mass-measured product as described above.
According to the method and the device, the plurality of defect description texts aiming at the crowd-tested product are obtained, the types of the plurality of defect description texts, namely the types of the defect description texts and the defects of the crowd-tested product can be obtained by determining the core word set of each defect description text and performing text clustering processing on the core word set, and a manager does not need to conduct one-to-one checking on the disordered defect description texts, but performs defect summarization on the crowd-tested product based on the types of the plurality of defect description texts obtained through clustering processing, so that the defect judging efficiency of the crowd-tested product can be improved.
Drawings
FIG. 1 is a schematic flow chart of a first embodiment of a defect determining method for a mass-measurement product according to the present application;
FIG. 2 is a schematic diagram of a second process according to a first embodiment of a defect determining method of a mass-measurement product of the present application;
FIG. 3 is a schematic view of a first scenario of a first embodiment of a defect determining method for a popular product of the present application;
FIG. 4 is a flowchart of a second embodiment of a defect determining method for a mass-measured product according to the present application;
FIG. 5 is a third flow chart of a third embodiment of a defect determining method for a mass-measurement product according to the present application;
FIG. 6 is a schematic diagram of a second scenario of a third embodiment of a defect determining method for a popular product of the present application;
FIG. 7 is a block diagram illustrating an exemplary embodiment of a defect determining apparatus for mass-detecting products according to the present application;
fig. 8 is a schematic device structure diagram of a hardware running environment according to an embodiment of the present application.
The realization, functional characteristics and advantages of the present application will be further described with reference to the embodiments, referring to the attached drawings.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
Referring to fig. 1, fig. 1 is a flowchart of a first embodiment of a defect determining method for a mass-measurement product according to the present application.
In a first embodiment, the defect determination method of the mass-measured product includes the steps of:
s10: a plurality of defect descriptive text for a crowd-tested product is obtained.
It should be noted that, the execution body of the method in this embodiment is a defect determination device of a mass-measurement product, and the defect determination device of the mass-measurement product may be a plug-in unit, an application program, or the like, which is not specifically limited in this application.
It can be understood that the defect judging method for the mass-measured product can be realized by installing or calling the defect judging device for the mass-measured product in the mass-measuring platform.
It should be noted that, under general circumstances, an enterprise uploads a product to be tested to a mass measurement platform, and in a certain period of time, mass measurement personnel will test the product to be tested, and upload a tested defect description text to the mass measurement platform, and then the manager collects the defect description text, and then gathers the defects of the product. Specifically, the mass-measurement product is a product to be tested which initiates a test in the mass-measurement platform, and the product to be tested can be application software, electronic equipment, household appliances, skin care products and the like. After the autonomous test is carried out on the product to be tested, the crowd test personnel can feed back the test result to the crowd test platform (generally only the defects of the product to be tested need to be fed back), and the manager responsible for defect summarization of the product to be tested can regularly collect the defect description text fed back by the crowd test personnel to summarize the defects.
It will be appreciated that the defect review text uploaded by the crowd-sourced may be in a text format, e.g., subject matter, abstract, text, etc., but the uploaded defect review text may also be in no text format due to the crowd-sourced uncontrollability; resulting in a relatively diverse format of defect description text received by the mass detection platform. Moreover, as a plurality of mass-measurement personnel can test the same mass-measurement product and some defects are easy to find, the plurality of mass-measurement personnel can repeatedly submit and randomly describe the known defects, so that the summarizing difficulty of the manager on the defects of the product is increased, and the defect judging efficiency of the mass-measurement product is reduced.
In order to improve the defect determination efficiency of the mass-measurement product, the embodiment needs to process the obtained plurality of defect description texts aiming at the mass-measurement product so as to assist the manager in performing defect determination of the mass-measurement product.
Specifically, the obtaining of the plurality of defect description texts for the crowd-tested product may be obtaining defect description texts uploaded by the crowd-tested person in a preset history period (1 day or one week, etc.), so as to collect the defect description texts regularly, or may be obtaining all defect description texts about the crowd-tested product, and collecting all defect description texts directly.
S20: and determining a core word set of each defect description text.
It can be understood that, since a large number of mass testers can test the same mass testers, the uploaded contents of the mass testers are different, and the description modes of the defects are different, in order to improve the defect judging efficiency of the mass testers, the key contents of the defect description texts are extracted by determining the core word set of the defect description texts, so that the influence of other contents on defect judgment is reduced.
Specifically, extracting the key content of each defect description text may be based on a preset defect description vocabulary library, and scanning each defect description text to obtain a core word set of each defect description text.
In this embodiment, as shown in fig. 2, the step of determining the core word set of each defect description text includes:
a1: performing word segmentation processing on each defect description text to obtain word segmentation processing results;
it should be noted that, in order to facilitate faster extraction of the key content of each defect description text, word segmentation may be performed on each defect description text to obtain a word segmentation result, and compared with performing vocabulary scanning on the whole defect description text, word segmentation is performed on all vocabularies in the defect description text, so that the extraction efficiency of the key content of the defect description text may be improved.
Specifically, the defect description text may be divided into a defect description text having a defect digest and a defect description text having no defect digest according to a text format. And then, word segmentation is carried out on the defect description text with the defect abstract by using a word segmentation tool, so that a word segmentation result of the defect description text with the defect abstract is obtained. And for the defect description text without the defect abstract, directly performing word segmentation processing on the defect description text without the defect abstract to obtain a word segmentation processing result of the defect description text without the defect abstract. Redundant processing of defect description text with defect abstract in part can be reduced, and word segmentation processing rate can be improved.
In this embodiment, word segmentation processing may be performed through the Jieba library to obtain a word segmentation processing result, and in particular, because the Jieba library supports a user-defined dictionary, an unusual vocabulary may be added to the user-defined dictionary when the word segmentation is performed, so that a more accurate word segmentation processing result is obtained quickly. Specifically, the word segmentation processing result is a vocabulary constituting the defect description text.
A2: based on the word segmentation processing result, determining the theme feature words and the high-frequency words of each defect description text;
specifically, the word segmentation processing result may be analyzed based on an NLP (Natural Language Processing ) model to obtain a theme feature word and a high-frequency word of each defect description text.
It is understood that the topic feature words are words capable of expressing the core content of the defect description text, and there may be multiple topic feature words or one topic feature word. The high-frequency words can improve the comparison efficiency between texts.
Specifically, based on the word segmentation processing result, a specific implementation manner of determining the theme feature words and the high-frequency words of each defect description text may be: based on the word segmentation processing result, determining the theme feature words of each defect description text through an LDA model; calculating the importance degree of the word segmentation vocabulary corresponding to each defect description text in each word segmentation processing result; and taking the vocabulary with the importance degree larger than the preset degree threshold value as the high-frequency word.
It should be noted that, in order to be able to determine the topic feature words more accurately, the present embodiment uses inputting the word segmentation result to an LDA (Latent Dirichlet Allocation, implicit dirichlet allocation) model, and based on the LDA model, treating the word segmentation result method as a word frequency vector, so as to convert the text information into digital information that is easy to model. The complexity of extracting the theme feature words is reduced without considering the appearance sequence among the words. Each word segmentation result corresponds to a probability distribution formed by some topics, and each topic corresponds to a probability distribution formed by a plurality of words, as shown in fig. 3, and a certain defect describes probability distributions of topics 1 and 2 corresponding to the word segmentation result of the text. And the text clustering is facilitated to be carried out according to the topic distribution. Therefore, the topic feature words obtained by the LDA model are probability distribution.
In this embodiment, since the topic feature words obtained by using the LDA model are random, the high-frequency word needs to be used as the core word set of the defect description text at the same time, so that the description accuracy of the core content of the defect description text can be improved.
Specifically, the occurrence frequency of word segmentation vocabulary corresponding to each defect description text in each word segmentation processing result can be calculated; and taking the vocabulary with the occurrence frequency larger than the preset frequency threshold value as the high-frequency word. The importance degree of the word segmentation vocabulary corresponding to each defect description text in each word segmentation processing result can be evaluated through a TF-IDF (term frequency-inverse document frequency) model.
A3: and determining a core word set of each defect description text based on the topic feature words and the high-frequency words corresponding to each defect description text.
It can be understood that the core content of each defect description text can be more accurately described by taking the topic feature words and the high-frequency words corresponding to each defect description text as the core word set of each defect description text, so that the difficulty in comparing each defect description text in the follow-up process is simplified.
Specifically, before the step of determining the theme feature words and the high-frequency words of each defect description text based on the word segmentation processing result, cleaning the word segmentation vocabulary in the word segmentation processing result to obtain a cleaned word segmentation processing result; and performing part-of-speech filtering processing on the cleaned word segmentation processing result to obtain a filtered word segmentation processing result, so as to determine the theme feature words and the high-frequency words of each defect description text based on the filtered word segmentation processing result, wherein the part-of-speech filtering processing comprises noun, verb and adjective preservation.
It will be appreciated that short text feature sparsity is inherently deficient due to defect description text. For the above reasons, it is difficult to perform a good effect on defect determination by directly processing the defect description text. Therefore, before the theme feature words and the high-frequency words of each defect description text are determined, useless interference information and words with smaller use are removed through cleaning processing and part-of-speech filtering processing, so that the efficiency of subsequently determining the theme feature words and the high-frequency words of each defect description text can be improved.
Specifically, the preset Chinese stop words, the non-civilized words and the like can be subjected to cleaning treatment. Words of parts of speech such as phonetic words and adverbs can be filtered, and words of parts of speech such as more useful nouns, verbs and adjectives are reserved.
In order to avoid nonsensical repeated judgment in the follow-up defect judgment, after the core word set is obtained, a plurality of defect description texts are screened based on a plurality of function modules about the mass-tested products, which are predetermined by a test item responsible person and related words such as known defects, and the defect description texts with the related words and the corresponding core word set are deleted.
S30: and performing text clustering processing on the core word set to obtain the types of the defect description texts so that a manager can judge the defects of the mass-measured products based on the types of the defect description texts.
It should be noted that, text clustering processing may be performed on the core word set by using an xmeans algorithm, so as to obtain types of the plurality of defect description texts. Specifically, the category number parameter of the xmeans algorithm may be set before text clustering: and obtaining BIC scores of text clustering after each round of iterative computation of an xmeans algorithm, wherein the xmeans algorithm can finish unsupervised text clustering based on minimum classification tolerance. Compared with the kmeans algorithm, the method can only set the category number parameter to be a specific numerical value, and in the embodiment, text clustering processing is preferably performed on the core word set through the xmeans algorithm, so that local minimization can be avoided, and the accuracy of types of the defect description texts can be improved.
It can be understood that after the types of the defect description texts are obtained, a large number of defect description texts are divided into a small number of defect description text sets by the types, and when a manager analyzes defects of a mass-measurement product, the manager can directly analyze the small number of defect description text sets, so that the workload of the manager is reduced, the working difficulty is reduced, and the defect judging efficiency of the mass-measurement product is improved.
In this embodiment, a plurality of defect description texts for a crowd-tested product are obtained, and by determining a core word set of each defect description text and performing text clustering processing on the core word set, the types of the plurality of defect description texts, that is, the types of the defect description texts and defects of the crowd-tested product, can be obtained, and a manager does not need to perform one-to-one check on the disordered defect description texts, but performs defect summarization on the crowd-tested product based on the types of the plurality of defect description texts obtained through clustering processing, so that defect judging efficiency of the crowd-tested product can be improved. And the text clustering processing of the core word set is improved by determining the theme feature words and the high-frequency words corresponding to the defect description texts, so that the accuracy of the types of the plurality of defect description texts is obtained.
As shown in fig. 4, a second embodiment of a defect determining method for a mass-measurement product according to the present application is provided based on the first embodiment, and in this embodiment, after the step A3, the method further includes:
b1: determining the similarity between the corresponding core words in the core word set and word segmentation words in the corresponding word segmentation processing result;
it should be noted that, in order to enable the extracted core word set of each defect description text to describe the core content more accurately, the embodiment expands the extracted core word set and reduces the possibility of word missing.
Specifically, the similarity calculation is performed on the word segmentation vocabulary in the corresponding core word set and the corresponding word segmentation processing result, which may be that the corresponding core word in the core word set and the word segmentation vocabulary in the word segmentation processing result are both converted into word vectors through an NLP model, cosine similarity values between the word vectors are calculated based on a cosine similarity formula, and the similarity between the corresponding core word in the core word set and the word segmentation vocabulary in the corresponding word segmentation processing result is determined based on the cosine similarity values.
In order to improve the calculation speed of the similarity, word2vec (word to vector) models can be directly input into word words corresponding to the core words in the core word set and word words corresponding to the word segmentation processing result, and the similarity between the core words corresponding to the core words in the core word set and the word words corresponding to the word segmentation processing result is calculated through the word2vec models.
B2: the word segmentation vocabulary with the similarity larger than a preset similarity threshold value is used as an expansion word;
b3: and adding the expansion word to the corresponding core word set.
Specifically, word-segmented words with cosine similarity value larger than a preset cosine similarity threshold value can be used as expansion words, or word-segmented words with similarity larger than the preset similarity threshold value can be used as expansion words. The preset similarity threshold may be 90% or more, 95% or more, or the like.
In order to avoid the influence on the efficiency of subsequent analysis caused by adding excessive expanded words, the first K word segmentation vocabularies with similarity larger than a preset similarity threshold can be used as expanded words in the embodiment; and adding the expanded word to a corresponding core word set, namely, each defect description text corresponds to one expanded core word set.
Specifically, before the step of performing text clustering processing on the core word set to obtain a text clustering result, mutual information and left and right information entropy between core words in the core word set can be calculated; screening composite words based on the mutual information and left and right information entropy; and determining a new core word set based on the composite word and the core words which are not combined into the composite word.
It should be noted that, the higher the left and right information entropy is used to represent the richness of the context of the core word, the larger the information quantity represented by the word context is considered, the higher the semantic uncertainty is, and the more likely it is that the word context is independently a phrase (compound word); the mutual information is used for representing the correlation degree between core words, and the higher the mutual information is, the more the core words are not detachable, and the more the core words are required to be used as a compound word.
Therefore, the embodiment calculates mutual information and left and right information entropy between the core words in the core word set; based on the mutual information and the left and right information entropy, compound words can be obtained through screening, and the compound words and core words which are not combined into the compound words are determined to be a new core word set. And performing text clustering processing on the new core word set to obtain the types of the defect description texts, increasing the semantic readability among the core words in the core word set, realizing feature thickening, and improving the classification effect generated by text clustering.
In this embodiment, by expanding the core word set and enriching the semantics of the core word set, the types of the plurality of defect description texts can be more accurate after text clustering processing is performed on the new core word set.
As shown in fig. 5, a third embodiment of a defect determining method for a popular-test product according to the present application is provided based on the first embodiment and the second embodiment, in this embodiment, after the step of performing text clustering processing on the core vocabulary to obtain types of the plurality of defect description texts, the method further includes at least one of the following:
c1: determining a defect description text set corresponding to each type of defect description text, determining the duty ratio of each type of defect description text based on the number of the defect description texts in each defect description text set and the total number of the plurality of defect description texts, and determining the serious defects of a mass-measurement product based on the duty ratio of each type of defect description text;
to further assist the manager in making a defect decision for the popular product, a serious defect (as it is more common) in which attention is most needed may be further determined based on the types of the plurality of defect description texts. Specifically, each type of defect description text forms a defect description text set, the proportion of each type of defect description text to the total number of all defect description texts can be calculated based on the number of the defect description texts in each defect description text set and the total number of the plurality of defect description texts, and if the proportion is higher, the defect description text is considered to be the serious defect which needs to be paid attention, and the defect description text can be pushed to a manager preferentially. If the proportion is lower, the data can be pushed to the manager in sequence.
C2: and acquiring uploading time of each defect description text in the defect description text set, judging the frequency of the presentation of the defect description text in each defect description text set in a preset period based on the uploading time, and determining whether to send an early warning to a manager based on the frequency of the presentation.
Further, in order to further assist the manager in judging the defects of the mass-measurement product, the uploading time of each defect description text in the defect description text set (the uploading time of the mass-measurement personnel uploading the defect description text to the mass-measurement platform) can be obtained, if the number of the defect description texts with the same type is large in a period of time, the manager needs to send an early warning to the manager if the frequency of the defect description text in each defect description text set in the preset period of time is high, the manager can regard the defect description text as a serious defect which needs to be paid attention most, and the mass-measurement personnel can be informed to stop uploading the defect description text, so that the project cost is reduced.
In order to further reduce project cost, the size of each defect description text set can be sorted based on the number of defect description texts in each defect description text set, and the defects corresponding to each defect description text set are displayed in a page of a crowded test platform from high to low based on the sorting from high to low so as to be checked by crowded test staff, and if the defects are uploaded again, the defects are invalid and are not paid. As shown in fig. 6, the order from large to small is defect description text set 1, defect description text sets 2, … …, defect description text set n.
Specifically, the defect description text uploaded by the first one of the defect description text sets of each type can be used as an effective defect description text based on the uploading time of each defect description text in the defect description text sets of each type, so that rewards are provided for the corresponding mass testers, and the test cost of mass testers is reduced.
In this embodiment, by calculating the proportion of each type of defect description text to the total number of all defect description texts, the most important serious defects can be found in time by means of the frequency of the defect description texts in each defect description text set in a preset period, so as to remind a manager of paying attention to abnormal factors such as deployment quality, version and network of the function modules related to the defect description texts, and the manager can also adjust the public testing strategy in time to avoid unnecessary project cost consumption.
In addition, an embodiment of the present application further provides a defect determining device for a mass-measured product, referring to fig. 7, where the defect determining device for a mass-measured product includes:
the text acquisition module is used for acquiring a plurality of defect description texts aiming at the mass-measured products;
the core word set determining module is used for determining the core word set of each defect description text;
and the text clustering module is used for carrying out text clustering processing on the core word set to obtain the types of the defect description texts so as to enable a manager to determine defects of the mass-measurement product based on the types of the defect description texts.
According to the method, the device and the system for obtaining the defect description texts aiming at the mass-measurement product, the types of the defect description texts can be obtained by determining the core word set of each defect description text and performing text clustering processing on the core word set, namely, the types of the defect description texts and defects of the mass-measurement product, a manager does not need to conduct one-to-one check on the disordered defect description texts, but performs defect summarization on the mass-measurement product based on the types of the defect description texts obtained through clustering processing, and defect judging efficiency of the mass-measurement product can be improved.
It should be noted that each module in the above apparatus may be used to implement each step in the above method, and achieve a corresponding technical effect, which is not described herein again.
Referring to fig. 8, fig. 8 is a schematic structural diagram of a device of a hardware running environment according to an embodiment of the present application.
As shown in fig. 8, the apparatus may include: a processor 1001, such as a CPU, a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display, an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a stable memory (non-volatile memory), such as a disk memory. The memory 1005 may also optionally be a storage device separate from the processor 1001 described above.
It will be appreciated by those skilled in the art that the structure shown in fig. 8 is not limiting of the apparatus and may include more or fewer components than shown, or certain components may be combined, or a different arrangement of components.
As shown in fig. 8, an operating system, a network communication module, a user interface module, and a defect determination program of a mass-measurement product may be included in a memory 1005 as one type of computer storage medium.
In the device shown in fig. 8, the network interface 1004 is mainly used for data communication with an external network; the user interface 1003 is mainly used for receiving an input instruction of a user; the apparatus calls a defect determination program of the popular product stored in the memory 1005 by the processor 1001, and performs the following operations:
acquiring a plurality of defect description texts aiming at a mass-measured product;
determining a core word set of each defect description text;
and performing text clustering processing on the core word set to obtain the types of the defect description texts so that a manager can judge the defects of the mass-measured products based on the types of the defect description texts.
Further, the processor 1001 may call a defect determination program of the popular product stored in the memory 1005, and further perform the following operations:
performing word segmentation processing on each defect description text to obtain word segmentation processing results;
based on the word segmentation processing result, determining the theme feature words and the high-frequency words of each defect description text;
and determining a core word set of each defect description text based on the topic feature words and the high-frequency words corresponding to each defect description text.
Further, the processor 1001 may call a defect determination program of the popular product stored in the memory 1005, and further perform the following operations:
determining the similarity between the corresponding core words in the core word set and word segmentation words in the corresponding word segmentation processing result;
the word segmentation vocabulary with the similarity larger than a preset similarity threshold value is used as an expansion word;
and adding the expansion word to the corresponding core word set.
Further, the processor 1001 may call a defect determination program of the popular product stored in the memory 1005, and further perform the following operations:
based on the word segmentation processing result, determining the theme feature words of each defect description text through an LDA model;
calculating the occurrence frequency of word segmentation vocabulary corresponding to each defect description text in each word segmentation processing result;
and taking the vocabulary with the occurrence frequency larger than the preset frequency threshold value as the high-frequency word.
Further, the processor 1001 may call a defect determination program of the popular product stored in the memory 1005, and further perform the following operations:
cleaning the word segmentation vocabulary in the word segmentation processing result to obtain a cleaned word segmentation processing result;
and performing part-of-speech filtering processing on the cleaned word segmentation processing result to obtain a filtered word segmentation processing result, so as to determine the theme feature words and the high-frequency words of each defect description text based on the filtered word segmentation processing result, wherein the part-of-speech filtering processing comprises noun, verb and adjective preservation.
Further, the processor 1001 may call a defect determination program of the popular product stored in the memory 1005, and further perform the following operations:
calculating mutual information and left and right information entropy among core words in the core word set;
screening composite words based on the mutual information and left and right information entropy;
and determining a new core word set based on the composite word and the core words which are not combined into the composite word.
Further, the processor 1001 may call a defect determination program of the popular product stored in the memory 1005, and further perform the following operations:
determining a defect description text set corresponding to each type of defect description text, determining the duty ratio of each type of defect description text based on the number of the defect description texts in each defect description text set and the total number of the plurality of defect description texts, and determining the serious defects of a mass-measurement product based on the duty ratio of each type of defect description text;
and/or acquiring uploading time of each defect description text in the defect description text set, judging the proposal frequency of the defect description text in each defect description text set in a preset period based on the uploading time, and determining whether to send an early warning to a manager based on the proposal frequency.
According to the method, the device and the system for obtaining the defect description texts aiming at the mass-measurement product, the types of the defect description texts can be obtained by determining the core word set of each defect description text and performing text clustering processing on the core word set, namely, the types of the defect description texts and defects of the mass-measurement product, a manager does not need to conduct one-to-one check on the disordered defect description texts, but performs defect summarization on the mass-measurement product based on the types of the defect description texts obtained through clustering processing, and defect judging efficiency of the mass-measurement product can be improved.
In addition, an embodiment of the present application further proposes a computer-readable storage medium, on which a defect determination program of a mass-measured product is stored, the defect determination program of the mass-measured product realizing the following operations when executed by a processor:
acquiring a plurality of defect description texts aiming at a mass-measured product;
determining a core word set of each defect description text;
and performing text clustering processing on the core word set to obtain the types of the defect description texts so that a manager can judge the defects of the mass-measured products based on the types of the defect description texts.
According to the method, the device and the system for obtaining the defect description texts aiming at the mass-measurement product, the types of the defect description texts can be obtained by determining the core word set of each defect description text and performing text clustering processing on the core word set, namely, the types of the defect description texts and defects of the mass-measurement product, a manager does not need to conduct one-to-one check on the disordered defect description texts, but performs defect summarization on the mass-measurement product based on the types of the defect description texts obtained through clustering processing, and defect judging efficiency of the mass-measurement product can be improved.
It should be noted that, when the computer readable storage medium is executed by the processor, each step in the method may be further implemented, and meanwhile, the corresponding technical effects are achieved, which is not described herein.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The foregoing embodiment numbers of the present application are merely for describing, and do not represent advantages or disadvantages of the embodiments.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) as described above, including several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method described in the embodiments of the present application.
The foregoing description is only of the preferred embodiments of the present application, and is not intended to limit the scope of the claims, and all equivalent structures or equivalent processes using the descriptions and drawings of the present application, or direct or indirect application in other related technical fields are included in the scope of the claims of the present application.

Claims (10)

1. The defect judging method of the mass-measurement product is characterized by comprising the following steps of:
acquiring a plurality of defect description texts aiming at a mass-measured product;
determining a core word set of each defect description text;
and performing text clustering processing on the core word set to obtain the types of the defect description texts so that a manager can judge the defects of the mass-measured products based on the types of the defect description texts.
2. The method of defect review product of claim 1 wherein the step of determining a core vocabulary for each of the defect descriptions text comprises:
performing word segmentation processing on each defect description text to obtain word segmentation processing results;
based on the word segmentation processing result, determining the theme feature words and the high-frequency words of each defect description text;
and determining a core word set of each defect description text based on the topic feature words and the high-frequency words corresponding to each defect description text.
3. The method for determining defects of a popular product according to claim 2, wherein after the step of determining the core word sets of the plurality of defect description texts based on the topic feature words and the high-frequency words corresponding to the defect description texts, the method further comprises:
determining the similarity between the corresponding core words in the core word set and word segmentation words in the corresponding word segmentation processing result;
the word segmentation vocabulary with the similarity larger than a preset similarity threshold value is used as an expansion word;
and adding the expansion word to the corresponding core word set.
4. The defect determining method of a popular product of claim 2, wherein the step of determining the subject feature words and the high frequency words of each of the defect description texts based on the word segmentation processing result comprises:
based on the word segmentation processing result, determining the theme feature words of each defect description text through an LDA model;
calculating the occurrence frequency of word segmentation vocabulary corresponding to each defect description text in each word segmentation processing result;
and taking the vocabulary with the occurrence frequency larger than the preset frequency threshold value as the high-frequency word.
5. The defect determining method of a popular product of claim 2, wherein before the step of determining a subject feature word and a high frequency word of each of the defect specification texts based on the word segmentation processing result, the method further comprises:
cleaning the word segmentation vocabulary in the word segmentation processing result to obtain a cleaned word segmentation processing result;
and performing part-of-speech filtering processing on the cleaned word segmentation processing result to obtain a filtered word segmentation processing result, so as to determine the theme feature words and the high-frequency words of each defect description text based on the filtered word segmentation processing result, wherein the part-of-speech filtering processing comprises noun, verb and adjective preservation.
6. The method of determining defects of a popular product of claim 1, wherein before the step of performing text clustering on the core word set to obtain a text clustering result, the method further comprises:
calculating mutual information and left and right information entropy among core words in the core word set;
screening composite words based on the mutual information and left and right information entropy;
and determining a new core word set based on the composite word and the core words which are not combined into the composite word.
7. The method of claim 1, wherein after the step of clustering the core vocabulary into the plurality of types of defect-describing text, the method further comprises at least one of:
determining a defect description text set corresponding to each type of defect description text, determining the duty ratio of each type of defect description text based on the number of the defect description texts in each defect description text set and the total number of the plurality of defect description texts, and determining the serious defects of a mass-measurement product based on the duty ratio of each type of defect description text;
and acquiring uploading time of each defect description text in the defect description text set, judging the frequency of the presentation of the defect description text in each defect description text set in a preset period based on the uploading time, and determining whether to send an early warning to a manager based on the frequency of the presentation.
8. A defect determining device for a mass-measured product, wherein the defect determining device for the mass-measured product comprises:
the text acquisition module is used for acquiring a plurality of defect description texts aiming at the mass-measured products;
the core word set determining module is used for determining the core word set of each defect description text;
and the text clustering module is used for carrying out text clustering processing on the core word set to obtain the types of the defect description texts so as to enable a manager to determine defects of the mass-measurement product based on the types of the defect description texts.
9. An apparatus, the apparatus comprising: memory, a processor and a defect determination program of a mass-measured product stored on the memory and executable on the processor, the defect determination program of a mass-measured product being configured to implement the steps of the defect determination method of a mass-measured product as claimed in any one of claims 1 to 7.
10. A computer-readable storage medium, wherein a defect determination program of a mass-measured product is stored on the computer-readable storage medium, which when executed by a processor, implements the steps of the defect determination method of a mass-measured product according to any one of claims 1 to 7.
CN202311290554.5A 2023-09-28 2023-09-28 Defect judging method, device and equipment for mass-measurement product and storage medium Pending CN117390185A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311290554.5A CN117390185A (en) 2023-09-28 2023-09-28 Defect judging method, device and equipment for mass-measurement product and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311290554.5A CN117390185A (en) 2023-09-28 2023-09-28 Defect judging method, device and equipment for mass-measurement product and storage medium

Publications (1)

Publication Number Publication Date
CN117390185A true CN117390185A (en) 2024-01-12

Family

ID=89462248

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311290554.5A Pending CN117390185A (en) 2023-09-28 2023-09-28 Defect judging method, device and equipment for mass-measurement product and storage medium

Country Status (1)

Country Link
CN (1) CN117390185A (en)

Similar Documents

Publication Publication Date Title
CN108073568B (en) Keyword extraction method and device
CN111143576A (en) Event-oriented dynamic knowledge graph construction method and device
KR101498331B1 (en) System for extracting term from document containing text segment
WO2017198031A1 (en) Semantic parsing method and apparatus
CN109271520B (en) Data extraction method, data extraction device, storage medium, and electronic apparatus
KR20200007969A (en) Information processing methods, terminals, and computer storage media
CN109933782B (en) User emotion prediction method and device
CN110909165A (en) Data processing method, device, medium and electronic equipment
CN110263854B (en) Live broadcast label determining method, device and storage medium
CN111241813B (en) Corpus expansion method, apparatus, device and medium
CN111159404B (en) Text classification method and device
CN108287848B (en) Method and system for semantic parsing
CN108536668B (en) Wake-up word evaluation method and device, storage medium and electronic equipment
CN109829154B (en) Personality prediction method based on semantics, user equipment, storage medium and device
CN104778184A (en) Feedback keyword determining method and device
CN110825839A (en) Incidence relation analysis method for targets in text information
CN113990352A (en) User emotion recognition and prediction method, device, equipment and storage medium
CN112765003A (en) Risk prediction method based on APP behavior log
CN112417846A (en) Text automatic generation method and device, electronic equipment and storage medium
CN106710588B (en) Speech data sentence recognition method, device and system
CN110019556B (en) Topic news acquisition method, device and equipment thereof
CN109753646B (en) Article attribute identification method and electronic equipment
Zendah et al. Detecting Significant Events in Arabic Microblogs using Soft Frequent Pattern Mining.
CN117390185A (en) Defect judging method, device and equipment for mass-measurement product and storage medium
CN115186063A (en) Service data processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination