CN113869408A - Classification method and computer equipment - Google Patents
Classification method and computer equipment Download PDFInfo
- Publication number
- CN113869408A CN113869408A CN202111138645.8A CN202111138645A CN113869408A CN 113869408 A CN113869408 A CN 113869408A CN 202111138645 A CN202111138645 A CN 202111138645A CN 113869408 A CN113869408 A CN 113869408A
- Authority
- CN
- China
- Prior art keywords
- software
- classification
- feature
- computer device
- computer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 89
- 239000013598 vector Substances 0.000 claims abstract description 137
- 238000013473 artificial intelligence Methods 0.000 claims abstract description 20
- 230000001419 dependent effect Effects 0.000 claims description 9
- 230000006870 function Effects 0.000 claims description 9
- 238000004883 computer application Methods 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 17
- 238000013461 design Methods 0.000 description 11
- 238000004590 computer program Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 5
- 238000004891 communication Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000003064 k means clustering Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/906—Clustering; Classification
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application discloses a classification method and computer equipment, which are applied to the technical field of computer application, and the classification method comprises the following steps: the method comprises the steps that computer equipment firstly obtains a first feature vector corresponding to first software, the first software is software on the computer equipment, and the first feature vector is obtained based on at least one first feature representing the attribute of the first software; then obtaining n classes corresponding to second software, wherein the second software is the software determined by the corresponding class; second feature vectors corresponding to the n pieces of second software are also obtained, and the second feature vectors are obtained based on at least one second feature representing the attribute of the second software; and then determining a first classification through an artificial intelligence method based on the first feature vector, the second feature vectors corresponding to the n second software and the classification, wherein the first classification is the classification of the first software. The software classification method and the software classification device can classify the software in any operating system or field, are wide in application range, increase in application scenes and improve accuracy of software classification.
Description
Technical Field
The present application relates to the field of computer application technologies, and in particular, to a classification method and a computer device.
Background
With the rapid development of the internet era, the software development speed is faster and faster, so that more and more software appears in the aspects of work and life of people, and great convenience is brought to the work and life of users. Meanwhile, the number of software on the computer device is greatly increased, and in order to improve the management efficiency of the computer device, it is an important way to perform classification management and retrieval on the software on the computer device, so that the software on the computer device needs to be classified. From different perspectives and starting points, the software can be classified in different ways, such as operating systems, databases, office environments, integrated development environments, application environments, and the like. At present, the software can be classified by adopting manual classification, automatic classification and other modes, the manual classification mode has lower efficiency and higher requirement on professional quality of personnel, and the research and development speed of the software in the society at present exceeds the speed of the current manual classification, so that the software is greatly required to be classified by utilizing the automatic classification mode.
The current automatic classification method is to target software in a certain operating system, such as linux system software, android system software, windows system software, and the like, and then determine the belonging classification according to control information in the software, wherein the control information is unique to each operating system, so that different operating systems cannot share the same automatic classification method, for example, the automatic classification method of the linux system cannot be expanded to classify the software in the windows system. Therefore, the current automatic classification method can only be used in a specific field, the application range is narrow, and the application scenes are few.
Disclosure of Invention
The embodiment of the application provides a classification method and computer equipment, which are used for determining a first classification of first software through a first feature vector corresponding to the acquired first software and second feature vectors corresponding to n pieces of second software and classification, and then determining a first classification of the first software through an artificial intelligence algorithm based on the first feature vector and the second feature vectors corresponding to the n pieces of second software and the classification, so that the software can be classified in any operating system or field, and the software classification method and computer equipment are wide in application range and applicable to various application scenes.
Based on this, the present application provides in a first aspect a classification method, including:
the method comprises the steps that computer equipment obtains a first feature vector corresponding to first software, the first software is software on the computer equipment, the first feature vector is obtained based on at least one first feature, and the first feature is a word representing the attribute of the first software;
the computer equipment acquires n classes corresponding to second software, wherein the second software is the software determined by the corresponding class, and n is greater than or equal to 1;
the computer equipment acquires second feature vectors corresponding to n pieces of second software, wherein the second feature vectors are obtained based on at least one second feature, and the second feature is a word representing the attribute of the second software;
the computer device determines a first classification through an artificial intelligence method based on the first feature vector and second feature vectors corresponding to the n pieces of second software and the classification, wherein the first classification is the classification of the first software.
In one possible implementation manner of the first aspect, the computer device acquires a first description text of the first software, wherein the first description text is a text describing an attribute of the first software;
the computer equipment acquires at least one first feature based on the first description text;
the computer device obtains a first feature vector based on the at least one first feature.
In a possible implementation of the first aspect, the computer device obtains the first descriptive text from a bibliographic and/or network information describing the first software.
In a possible implementation manner of the first aspect, the computer device obtains, according to the metadata and the code-dependent data of the n second software, corresponding classifications of the n second software;
and/or the presence of a gas in the gas,
the computer equipment obtains n classifications corresponding to the second software according to a first preset table, wherein the first preset table comprises n classification conditions corresponding to the second software.
In a possible implementation manner of the first aspect, the computer device obtains second feature vectors corresponding to n second software according to n second description texts, where the second description texts are texts describing attributes of the second software;
and/or the presence of a gas in the gas,
and the computer equipment acquires second feature vectors corresponding to the n pieces of second software according to a second preset table entry, wherein the second preset table entry comprises the preset second feature vectors corresponding to the n pieces of second software.
In a possible implementation manner of the first aspect, the computer device obtains n second description texts corresponding to n second software, respectively;
the computer equipment acquires n second characteristics corresponding to the second software respectively based on the n second description texts, wherein the number of the second characteristics is greater than or equal to 1;
and the computer equipment acquires second feature vectors based on the second features respectively corresponding to the n pieces of second software.
In a possible embodiment of the first aspect, the computer device obtains the n second descriptive texts from literature and/or network information describing the n second software, respectively.
In one possible implementation of the first aspect, the artificial intelligence method comprises at least any one of:
a proximity algorithm or a clustering algorithm.
In a possible implementation of the first aspect, the attributes of the first software and the attributes of the second software respectively comprise at least one of the following:
the software comprises a function corresponding to the software, a publisher of the software, an application scenario of the software or iterative version information of the software, wherein the software is first software or second software.
A second aspect of the present application provides a computer device comprising:
the first acquiring unit is used for acquiring a first feature vector of first software, wherein the first software is software on computer equipment, the first feature vector is composed of at least one first feature, and the first feature is a word representing the attribute of the first software;
the second acquisition unit is used for acquiring n classifications corresponding to second software, wherein the second software is software determined by the corresponding classification, and n is greater than or equal to 1;
a third obtaining unit, configured to obtain second feature vectors corresponding to n pieces of second software, where the second feature vectors are obtained based on at least one second feature, and the second feature is a word representing an attribute of the second software;
and the determining unit is used for determining a first classification through an artificial intelligence method based on the first feature vector, the second feature vectors corresponding to the n pieces of second software and the classification, wherein the first classification is the classification of the first software.
The computer device of the second aspect of the embodiments of the present application executes the method described in the first aspect of the embodiments of the present application or any possible implementation manner of the first aspect.
A third aspect of the present application provides a computer device comprising: a memory, a transceiver, a processor, and a bus system;
wherein, the memory is used for storing programs;
a processor for executing a program in a memory to implement the method described in the first aspect or any one of the possible implementations of the first aspect;
the bus system is used for connecting the memory and the processor so as to enable the memory and the processor to communicate.
A fourth aspect of the present application provides a computer-readable storage medium having stored therein instructions, which, when run on a computer, cause the computer to perform the method described in the first aspect or any one of the possible implementations of the first aspect.
A fifth aspect of the application provides a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the method described in the first aspect or any of the possible implementations of the first aspect.
According to the technical scheme, the embodiment of the application has the following advantages:
in the embodiment of the present application, a computer device first obtains a first feature vector of first software, where the first software is software on the computer device, the first feature vector is obtained based on at least one first feature, the first feature is a word indicating an attribute of the first software, the computer device further obtains n second feature vectors of second software and a classification thereof, the second feature vectors are obtained based on at least one second feature, the second feature is a word indicating an attribute of the second software, and then the computer device determines the classification of the first software through an artificial intelligence algorithm based on the first feature vectors, the n second feature vectors corresponding to the second software, and the classification thereof. Because the classification of the second software is already determined, and the first feature vector and the second feature vector are obtained by words respectively describing the attributes of the first software and the second software, the classification method of the embodiment of the application can classify the software in any operating system or field, has a wide application range, and is suitable for various application scenarios.
Drawings
Fig. 1 is a schematic flowchart of a classification method according to an embodiment of the present application;
FIG. 2 is a diagram illustrating an embodiment of obtaining a first feature vector;
FIG. 3 is a schematic diagram of a description document for software;
FIG. 4 is a schematic diagram of the search software returning information;
FIG. 5 is a schematic diagram of the pedigree branch (partial) of the software debian;
FIG. 6 is a diagram illustrating the classification corresponding to software debian;
FIG. 7 is a diagram illustrating a software library entry according to an embodiment of the present application;
FIG. 8 is a schematic structural diagram of a computer device provided in an embodiment of the present application;
fig. 9 is another schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
The embodiment of the application provides a classification method and computer equipment, which are used for determining the classification of first software through a first feature vector corresponding to the acquired first software and second feature vectors corresponding to n pieces of second software and classification, and then determining the classification of the first software through an artificial intelligence algorithm based on the first feature vector and the second feature vectors corresponding to the n pieces of second software and the classification.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "corresponding" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
With the rapid improvement of the software development speed, more and more software comes into the world and participates in the life and work of users, and the software development speed is already higher than that of the traditional manual classification mode, so that the software is automatically classified, the management efficiency of computer equipment is improved, and the use experience of the users is improved. Currently, the automatic classification method for software is to target software in an operating system and then determine the classification of the software according to control information in the software. However, different operating systems cannot share the same classification method, for example, the automatic classification method of the linux system cannot be expanded to classify software in the windows system. Therefore, the current automatic classification method can only be used in a specific field, the application range is narrow, and the application scenes are few.
In order to solve the foregoing problems, an embodiment of the present application provides a classification method and a computer device, where the computer device determines a classification of a first software based on an acquired first feature vector corresponding to the first software and n second feature vectors corresponding to second software and their classifications, and the first feature vector and the second feature vector are obtained by words representing attributes of the first software and the second software, respectively, so that the method and the computer device can be used in any operating system or field, have a wide application range, and can be applied to a variety of application scenarios.
Embodiments of the present application are described below with reference to the accompanying drawings. As can be known to those skilled in the art, with the development of technology and the emergence of new scenarios, the technical solution provided in the embodiments of the present application is also applicable to similar technical problems. Referring to fig. 1, fig. 1 is a schematic flow chart of a classification method provided in an embodiment of the present application, which specifically includes:
101. the computer equipment acquires a first feature vector corresponding to the first software.
In this embodiment, the computer device obtains a first feature vector corresponding to first software, where the first software may be any software on the computer device, the first feature vector is obtained based on at least one first feature, and the first feature is a word representing an attribute of the first software.
In a possible implementation, the computer device first obtains a first description text of the first software, where the first description text is a text describing attributes of the first software, and then obtains at least one first feature based on the first description text, and then obtains a first feature vector based on the at least one first feature. For easy understanding, please refer to fig. 2, fig. 2 is a schematic diagram illustrating an embodiment of the present application for obtaining a first feature vector, which specifically includes:
201. the computer device obtains a first descriptive text.
The computer device acquires a first description text which is a text describing the attribute of the first software. In one possible implementation, the computer device obtains the first descriptive text based on literature or network information describing the first software. Optionally, the computer device may obtain the first description text based on the literature or the network information describing the first software through at least one of preset keyword or core word extraction, abstract generation, and the like, and in an actual situation, may also obtain the first description text through other manners, which is not limited herein.
In the embodiment of the application, the computer device obtains the first description text from the literature or the network information of the first software in a plurality of ways, so that the application scenes are increased, and the application range is widened.
Optionally, the literature or network information may include at least one of a basic description document or information about the first software, a description document or information about the first software on an authoritative website, or a document or information returned by the search engine searching for the first software, and it is understood that in practical cases, the literature or network information may also include other aspects, and is not limited herein. The basic description document or information of the first software is exemplified, and may be, for example, a usage scenario of the first software by the first software provider, and specific description information; for example, an official document describing the first software on the authoritative website, for example, an official document describing the first software, and a description of the first software by the authoritative website, such as an official website, wikipedia, or encyclopedia, refer to fig. 3 for a specific example, fig. 3 is a schematic diagram of a description document of the software, and for example, a document or information returned by the search engine searching the first software, for example, information returned by the first software is searched by using any search engine, such as google, hundred, dog search, or good search, refer to fig. 4 for a specific example, and fig. 4 is a schematic diagram of information returned by the search software.
In the embodiment of the application, the literature or the network information can comprise a plurality of aspects, and the first description text is acquired from the literature or the network information of the plurality of aspects, so that the possibility of acquiring the accuracy description information can be improved, and the accuracy of determining the classification of the first software is improved.
Optionally, the attribute of the first software may include at least one aspect of a function corresponding to the first software, a publisher of the first software, an application scenario of the first software, or iterative version information of the first software, and it is understood that in an actual situation, the attribute of the first software may further include other aspects, and is not limited herein.
In the embodiment of the application, the attribute of the first software described in the first description text can comprise a plurality of aspects, and the classification accuracy of the first software is improved.
202. The computer device obtains at least one first feature based on the first description text.
The computer device obtains at least one first feature based on the first description text. In a possible implementation, the computer device obtains at least one first feature from the first description text according to a preset keyword or a key sentence, it is understood that in practical cases, the first feature can also be obtained according to other requirements, and the details are not limited herein.
In the embodiment of the application, the computer device obtains at least one first feature from the first description text according to the preset keyword or the preset key sentence, so that the application scenes are increased, and the flexibility of the scheme is improved.
203. The computer device obtains a first feature vector based on the at least one first feature.
The computer device obtains the first feature vector based on the at least one first feature, and in one possible implementation, the computer device obtains the first feature vector based on the at least one first feature through vectorization, for convenience of understanding, the first feature vector obtained through vectorization of the at least one first feature of the software Sx may be Sx (f 1: x1, f 2: x2, f 3: x3, …, fn: xn), where Sx is a first software Sx, fi is a word (in the at least one first feature), and xi is a numerical value corresponding to the importance degree of the word in description information of the first software Sx, and the numerical value may be represented by a word frequency-inverse text frequency index (term frequency-inverse text frequency domain front, TF-IDF) algorithm, a bag of words (bag of words, BOW) model, a chinese language model (model), CLM) or Word2vec, it is understood that the respective manners listed here are merely examples, and in practical cases, the same purpose can be achieved by other manners, and the details are not limited herein.
In the embodiment of the application, the at least one first feature can be vectorized in multiple ways to obtain the first feature vector, so that the application range of the scheme and the flexibility of the scheme are improved.
It should be noted that the example in fig. 2 is only used for understanding the present solution, and in practical applications, the computer device may also obtain the first feature vector according to other manners, so the example in fig. 2 should not be construed as a limitation of the present solution.
102. And the computer equipment acquires the classifications corresponding to the n second software.
The computer equipment acquires n classes corresponding to second software, wherein the second software is the software of which the classes are determined, and n is greater than or equal to 1.
In some embodiments of the present application, the computer device obtains, according to the metadata and the code dependency data of the n pieces of second software, classifications corresponding to the n pieces of second software, and/or obtains, according to a first preset entry, the classifications corresponding to the n pieces of second software, where the first preset entry includes classification conditions corresponding to the n pieces of second software. The following are described separately:
and in the mode 1, the computer equipment acquires the corresponding classification of the n second software according to the metadata and the code dependent data of the n second software.
The computer device can trace the existing software with known classification according to the metadata and the code dependent data of the n pieces of second software, and then determine the classification of the second software through the classification of the known software. For convenience of understanding, as illustrated below, for some relatively large-scale software, a family tree diagram with some open source software as a source is usually formed, for example, a family tree diagram with linux, postgreSQL, hadoop and other software as a source header is formed, for convenience of understanding, please refer to the example in fig. 5, fig. 5 is a schematic diagram of a family tree branch (local) of software debian, debian in fig. 5 serves as open source software, and other software forms dependence and reference on the software debian through metadata or code dependent data, so that by analyzing metadata and code dependent data of software to be classified, existing known other software corresponding to the software debian can be traced back to, and then classification of the software to be classified is determined through classification of known software. The software debian may also trace to the corresponding software of the existing known classification through metadata or code dependent data, and then determine the classification of the software debian, specifically referring to the example of fig. 6, fig. 6 is a schematic diagram of the classification corresponding to the software debian, and in fig. 6, a classification of the software debian under the linux system is known.
In the implementation mode of the application, the computer equipment acquires the classification of the second software according to the metadata and the code dependence data, the traditional manual classification is replaced, the working efficiency is improved, the cost is saved, and the classification accuracy is also improved.
And 2, the computer equipment acquires n classifications corresponding to the second software according to the first preset table entry.
The computer equipment obtains the classifications corresponding to the n pieces of second software according to a first preset table, wherein the first preset table comprises the classification conditions corresponding to the n pieces of second software.
In a possible implementation, the classifications corresponding to the n second software items included in the first preset entry are classifications that have been approved by the public or professional industry, and optionally, the classifications corresponding to the n second software items included in the first preset entry may also be confirmed by a technician with professional knowledge. It should be noted that the classification corresponding to the n second software items included in the first preset entry may be composed of the above two forms.
In an embodiment of the present application, the classification of the n second software items included in the first preset entry may be a classification that has been recognized by the public or professional industry, or may be confirmed by a technician with professional knowledge, and the multiple manners provide multiple implementation manners for the scheme, and improve the accuracy of classification through professional recognition and classification determined by professional knowledge.
In practical cases, the computer device may obtain n categories corresponding to the second software from the two manners, or may obtain n categories corresponding to the second software from one manner, which is not limited herein.
It should be noted that, the execution sequence between step 102 and step 101 is not limited.
103. And the computer equipment acquires n second feature vectors corresponding to the second software.
The computer equipment acquires second feature vectors corresponding to the n pieces of second software, wherein the second feature vectors are obtained based on at least one second feature, and the second feature is a word representing the attribute of the second software.
In some embodiments of the present application, the computer device obtains n second feature vectors corresponding to the second software according to n second description texts, where the second description texts are texts describing attributes of the second software, and/or obtains n second feature vectors corresponding to the second software according to a second preset entry, where the second preset entry includes the preset n second feature vectors corresponding to the second software. First, it is illustrated that the representation of the second feature vector of the second software Sy may be similar Sy (f 1: y1, f 2: y2, f 3: y3, …, fn: yn), where Sy is the second software, fi is a word (the word is in the aforementioned at least one second feature), and yi corresponding to the word is a numerical value corresponding to the importance degree of the word in the description information of the second software Sy, and the manner of obtaining the numerical value is similar to that in the above step 101, and is not described herein again. Next, two modes are described below:
in the mode 1, the computer equipment acquires n second feature vectors corresponding to the second software according to the n second description texts.
And the computer equipment acquires n second feature vectors corresponding to the second software according to n second description texts, wherein the second description texts are texts for describing the attributes of the second software.
Optionally, the attribute of the second software may include at least one aspect of a function corresponding to the second software, a publisher of the second software, an application scenario of the second software, or iterative version information of the second software, and it is understood that in an actual situation, the attribute of the second software may also include other aspects, and is not limited herein.
In the embodiment of the application, the attributes of the second software described in the second description text can include multiple aspects, so that the classification accuracy of the second software is improved.
In a possible mode, the computer device first obtains n second description texts corresponding to the n second software respectively, then the computer device obtains n second features corresponding to the n second software respectively based on the n second description texts, the number of the second features is greater than or equal to 1, and then obtains second feature vectors based on the n second features corresponding to the second software respectively. Here, the content of the specific implementation form of the computer device acquiring the second feature vector is similar to that of the computer device acquiring the first feature vector shown in fig. 2 in step 101 described above, and therefore, details are not described here.
In the embodiment of the application, the computer device obtains the second feature vectors corresponding to the n second software according to the n second description texts, so that the accuracy of the obtained second feature vectors is improved, and therefore the accuracy of the obtained first software classification is improved.
And 2, the computer equipment acquires n second feature vectors corresponding to the second software according to the second preset table entry.
And the computer equipment acquires second feature vectors corresponding to n pieces of second software according to a second preset table entry, wherein the second preset table entry comprises the preset second feature vectors corresponding to n pieces of second software. Optionally, the second feature vectors corresponding to the n preset second software items included in the second preset table entry may be manually identified, or may be generated after the computer device or other devices obtain the second feature vectors corresponding to the n second software items, which is not limited herein.
In a possible implementation, the second preset entry may be in the same entry as the first preset entry in step 102, or may exist independently, and is not limited herein. Fig. 7 is a schematic diagram of a software library entry provided in an embodiment of the present application, where the software library entry includes n classifications corresponding to second software and n second feature vectors corresponding to the second software, and a computer device may directly obtain the n second feature vectors and the classifications corresponding to the second software from the software library entry, so as to reduce occupation of network resources.
It should be noted that the example in fig. 7 is only used for understanding the present solution, and in practical applications, the n classes corresponding to the second software and the second feature vectors can also be embodied in other forms, so the example in fig. 7 should not be construed as a limitation to the present solution.
In the embodiment of the application, the computer device obtains the second feature vectors corresponding to the n second software according to the second preset table entry, and can directly obtain the second feature vectors, so that the working efficiency is improved.
In an embodiment of the present application, the computer device may obtain the second feature vectors corresponding to the n pieces of second software according to one of the above-mentioned manners, or may obtain the second feature vectors corresponding to the n pieces of second software from two manners at the same time, which is not limited herein.
It should be noted that, the execution sequence between step 103 and step 101 or step 102 is not limited.
104. The computer device determines a first classification by an artificial intelligence method based on the first feature vector and second feature vectors corresponding to the n second software and the classification.
The computer device determines a first classification through an artificial intelligence method based on the first feature vector and second feature vectors corresponding to the n pieces of second software and the classification, wherein the first classification is the classification of the first software.
In some embodiments of the present application, the artificial intelligence method may be a proximity algorithm or a clustering algorithm, or other algorithms that can achieve the same purpose, and in practical cases, a specific algorithm may be selected according to actual requirements, and is not limited herein. For the convenience of understanding the present solution, the computer device determines the first classification by using a K-nearest neighbor (KNN) algorithm based on the first feature vector and n second feature vectors corresponding to the second software and the classification.
For understanding, a KNN algorithm is first briefly introduced, and is a classification method in the field of artificial intelligence, and can classify contents such as texts and images containing different features. The core idea is that if most of k nearest neighbor samples of a sample in the feature space belong to a certain class, the sample also belongs to the class and has the characteristics of the sample on the class. Wherein k is a core parameter of the algorithm, and different k may result in different classifications, so that in different application scenarios, the value of k can be adjusted to make the classification accuracy higher.
In some embodiments of the application, for the selection of k values of different application scenarios, n pieces of second software may be first divided into a part a and a part B, the part a is used as a training sample, the part B is used as a test sample, the classification of the part B samples is tested by adjusting the size of the k value, and a corresponding k value, which is most similar to the classification of the part B samples acquired by the computer device, of the obtained classification is selected as a target k value.
After the target k value is determined, the computer device may determine the distance between the first feature vector and the second feature vectors corresponding to the n pieces of second software by using any one of euclidean distance, manhattan distance, cosine value, and the like, and the closer the obtained value is to 1, the more the first feature vector is identified with the second feature vector, that is, the closer the classification of the corresponding first software and the corresponding second software is to the same. And then sorting according to the increasing relation of the distances, selecting k second software with the smallest distance in the sorting, wherein the k value is a target k value, determining the first-occupied classification in the classifications corresponding to the k second software, and then determining the classification as the first classification, namely the classification of the first software. It should be understood that the above-mentioned determination of the first classification by the KNN algorithm is merely an example, and should not be construed as a limitation of the present solution.
Optionally, the computer device may determine the first classification based on the first feature vector and the second feature vectors and classifications corresponding to the n pieces of second software, and may further determine the first classification by using a k-means clustering algorithm, a clustering algorithm, and other methods or algorithms capable of achieving the same purpose.
In the embodiment of the application, the computer device determines the first classification by an artificial intelligence method based on the first feature vector, the second feature vectors corresponding to the n pieces of second software and the classification, so that the classification accuracy is improved.
It should be noted that, the computer device described in this embodiment of the present application may be a cloud-side device (e.g., a cloud server, a cluster, etc.), or may also be an end-side device (e.g., a mobile phone, a personal computer, etc.), and as long as the device can perform each step in the embodiment corresponding to fig. 1 of the present application, the device may be referred to as a computer device, and a specific representation form of the computer device is not limited herein.
In the embodiment of the application, the computer device obtains a first feature vector of first software, also obtains second feature vectors corresponding to n second software and classifications, and then determines a first classification through an artificial intelligence method based on the first feature vector and the second feature vectors corresponding to n second software and the classifications.
In order to implement the functions in the methods provided by the embodiments of the present application, the computer device may include a hardware structure and/or a software module, and the functions are implemented in the form of a hardware structure, a software module, or a hardware structure and a software module. Whether any of the above-described functions is implemented as a hardware structure, a software module, or a hardware structure plus a software module depends upon the particular application and design constraints imposed on the technical solution.
As shown in fig. 8, an embodiment of the present application further provides a computer device 800, specifically please refer to fig. 8, fig. 8 is a schematic structural diagram of the computer device provided in the embodiment of the present application, where the computer device 800 may be a cloud-side device (e.g., a cloud server, a cluster, etc.), an end-side device (e.g., a mobile phone, a personal computer, etc.), or a device capable of being used in cooperation with a terminal device and a network device. In a possible implementation, the computer device 800 may include a module or a unit corresponding to one or more of the methods/operations/steps/actions performed by the computer device in the above method embodiments, and the unit may be a hardware circuit, a software circuit, or a combination of a hardware circuit and a software circuit. In one possible implementation, the computer device 800 includes: a first acquisition unit 801, a second acquisition unit 802, a third acquisition unit 803, and a determination unit 804. The first obtaining unit 801 may be configured to perform a step of obtaining a first feature vector of first software in the above method embodiment, the second obtaining unit 802 may be configured to perform a step of obtaining n classes corresponding to second software in the above method embodiment, the third obtaining unit 803 may be configured to perform a step of obtaining n second feature vectors corresponding to second software in the above method embodiment, and the determining unit 804 may be configured to perform a step of determining a first class by an artificial intelligence method based on the first feature vector and the n second feature vectors corresponding to second software and the classes in the above method embodiment.
In this embodiment of the application, the first obtaining unit 801 is configured to obtain a first feature vector of a first software, the second obtaining unit 802 is configured to obtain classifications corresponding to n second software, and the third obtaining unit 803 is configured to obtain second feature vectors corresponding to n second software, and then the determining unit 804 determines a first classification, that is, the classification of the first software, by an artificial intelligence method on the basis of the first feature vector, the second feature vectors corresponding to n second software, and the classifications. The method and the device for determining the classification of the first software can determine the classification of the first software in any operating system or field, are wide in application range, increase in application scenes and improve the classification accuracy.
In other possible designs, the above-mentioned first obtaining unit 801, second obtaining unit 802, third obtaining unit 803 and determining unit 804 may perform the methods/operations/steps/actions in various possible implementations of the above-mentioned method embodiments in a one-to-one correspondence.
In a possible design, the first obtaining unit 801 may be configured to obtain a first description text of the first software, where the first description text is a text describing an attribute of the first software;
can be used for obtaining at least one first feature based on the first description text;
may be used to obtain a first feature vector based on the at least one first feature.
In a possible design, the first obtaining unit 801 may be configured to obtain the first description text from a document and/or network information describing the first software.
In a possible design, the second obtaining unit 802 may be configured to obtain corresponding classifications of the n pieces of second software according to the metadata and the code-dependent data of the n pieces of second software;
and/or the presence of a gas in the gas,
the method and the device can be used for obtaining the classifications corresponding to the n pieces of second software according to a first preset table entry, wherein the first preset table entry comprises the classification conditions corresponding to the n pieces of second software.
In a possible design, the third obtaining unit 803 may be configured to obtain n second feature vectors corresponding to second software according to n second description texts, where the second description texts are texts describing attributes of the second software;
and/or the presence of a gas in the gas,
the method may be configured to obtain n second feature vectors corresponding to second software according to a second preset table entry, where the second preset table entry includes the preset n second feature vectors corresponding to the second software.
In a possible design, the third obtaining unit 803 may be configured to obtain n second description texts corresponding to n second software, respectively;
the method can be used for acquiring n second features respectively corresponding to the second software based on the n second description texts, wherein the number of the second features is greater than or equal to 1;
the method may be configured to obtain a second feature vector based on second features respectively corresponding to the n pieces of second software.
In a possible design, the third obtaining unit 803 may be configured to obtain n second description texts from the literature and/or the network information respectively describing the n second software.
In one possible design, the artificial intelligence method includes at least any one of:
a proximity algorithm or a clustering algorithm.
In one possible design, the properties of the first software and the properties of the second software each include at least one of the following:
the software comprises a function corresponding to the software, a publisher of the software, an application scenario of the software or iterative version information of the software, wherein the software comprises first software and second software.
For the beneficial effects of the communication devices with various designs described above, please refer to the beneficial effects of the various implementation manners corresponding to one another in the method embodiment in fig. 1, which are not described herein again.
It should be noted that, the contents of information interaction, execution process, and the like between modules/units in the computer device described in the embodiment corresponding to fig. 8 are based on the same concept as the method embodiment corresponding to fig. 1 in the present application, and specific contents may refer to the description in the foregoing method embodiment in the present application, and are not described herein again.
In addition, functional modules or units in the embodiments of the present application may be integrated into one processor, may exist alone physically, or may be integrated into one module or unit by two or more modules or units. The integrated modules or units may be implemented in the form of hardware, or may be implemented in the form of software functional modules.
Referring to fig. 9, fig. 9 is a schematic view of another structure of a computer device according to an embodiment of the present application, and as shown in fig. 9, the computer device 900 includes a processor 910, a memory 920 coupled to the processor 910, and a transceiver 930. In some implementations, they may be coupled together by a bus. The computer device 900 may be a server or a terminal device. The processor 910 may be a Central Processing Unit (CPU), a Network Processor (NP), or a combination of the CPU and the NP. The processor may also be an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD), or a combination thereof. The memory 920 has stored therein computer readable instructions for performing any of the methods of the possible embodiments described above. The processor 910, after executing the computer readable instructions, may perform corresponding operations as instructed by the computer readable instructions. In addition, after the processor 910 executes the computer readable instructions in the memory 920, all operations that the server can perform, such as the operations performed by the computer device in the embodiment corresponding to fig. 1, may be performed according to the instructions of the computer readable instructions. Transceiver 930 includes a port for outputting data and, in some cases, a port for inputting data.
The processor 910 can invoke the transceiver 930 by executing code to obtain a set of lines to be matched and a set of target orders.
Also provided in the embodiments of the present application is a computer-readable storage medium, which stores a computer program, and when the computer program runs on a computer, the computer program causes the computer to execute the steps performed by the server in the method described in the foregoing embodiment shown in fig. 1.
Also provided in an embodiment of the present application is a computer program product including a program, which when run on a computer causes the computer to perform the steps performed by the server in the method as described in the embodiment of fig. 1.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions in actual implementation, for example, at least two units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may also be distributed on at least two network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.
Claims (20)
1. A method of classification, comprising:
the method comprises the steps that computer equipment obtains a first feature vector corresponding to first software, wherein the first software is software on the computer equipment, the first feature vector is obtained based on at least one first feature, and the first feature is a word representing the attribute of the first software;
the computer equipment acquires n classes corresponding to second software, the second software is software determined by the corresponding class, and n is greater than or equal to 1;
the computer equipment acquires n second feature vectors corresponding to the second software, wherein the second feature vectors are obtained based on at least one second feature, and the second feature is a word representing the attribute of the second software;
the computer device determines a first classification through an artificial intelligence method based on the first feature vector, the second feature vectors corresponding to the n pieces of second software, and the classification, wherein the first classification is the classification of the first software.
2. The classification method according to claim 1, wherein the computer device obtaining a first feature vector of a first software comprises:
the computer equipment acquires a first description text of the first software, wherein the first description text is a text describing the attribute of the first software;
the computer equipment acquires at least one first feature based on the first description text;
the computer device obtains the first feature vector based on at least one of the first features.
3. The classification method according to claim 2, wherein the computer device obtaining the description text of the first software comprises:
the computer device obtains the first description text from the literature and/or network information describing the first software.
4. The classification method according to claim 1 or 2, wherein the computer device obtaining n classes corresponding to the second software comprises:
the computer equipment acquires n classifications corresponding to the second software according to the metadata and the code dependent data of the n second software;
and/or the presence of a gas in the gas,
the computer equipment obtains n classifications corresponding to the second software according to a first preset table, wherein the first preset table comprises n classification conditions corresponding to the second software.
5. The classification method according to claim 1 or 2, wherein the computer device obtaining n second feature vectors corresponding to the second software comprises:
the computer equipment acquires n second feature vectors corresponding to the second software according to n second description texts, wherein the second description texts are texts describing attributes of the second software;
and/or the presence of a gas in the gas,
the computer device obtains n second feature vectors corresponding to the second software according to a second preset table entry, where the second preset table entry includes n preset second feature vectors corresponding to the second software.
6. The classification method according to claim 5, wherein the computer device obtaining n second feature vectors corresponding to the second software according to n second description texts comprises:
the computer equipment acquires n second description texts corresponding to the n second software respectively;
the computer equipment acquires n second characteristics corresponding to the second software respectively based on n second description texts, wherein the number of the second characteristics is greater than or equal to 1;
the computer device obtains the second feature vector based on the second features respectively corresponding to the n pieces of second software.
7. The classification method according to claim 6, wherein the computer device obtaining n second description texts corresponding to n second software respectively comprises:
the computer equipment acquires n second description texts from literature and/or network information which respectively describe n second software.
8. A classification method according to any one of claims 1-3, 6 or 7, characterised in that the artificial intelligence method comprises at least one of:
a proximity algorithm or a clustering algorithm.
9. The classification method according to any one of claims 1 to 3, wherein the attributes of the first software and the attributes of the second software respectively comprise at least one of:
the software processing method comprises a function corresponding to the software, a publisher of the software, an application scenario of the software or iteration version information of the software, wherein the software is the first software or the second software.
10. A computer device, characterized in that the computer device comprises:
the first acquiring unit is used for acquiring a first feature vector of first software, wherein the first software is software on the computer equipment, the first feature vector is obtained based on at least one first feature, and the first feature is a word representing the attribute of the first software;
the second acquisition unit is used for acquiring n classifications corresponding to second software, the second software is software determined by the corresponding classification, and n is greater than or equal to 1;
a third obtaining unit, configured to obtain n second feature vectors corresponding to the second software, where the second feature vectors are obtained based on at least one second feature, and the second feature is a word representing an attribute of the second software;
a determining unit, configured to determine a first classification by an artificial intelligence method based on the first feature vector, the second feature vectors corresponding to the n pieces of second software, and the classification, where the first classification is a classification of the first software.
11. The computer device according to claim 10, wherein the first obtaining unit is specifically configured to obtain a first description text of the first software, where the first description text is a text describing an attribute of the first software;
the first obtaining unit is specifically configured to obtain at least one first feature based on the first description text;
the first obtaining unit is specifically configured to obtain the first feature vector based on at least one of the first features.
12. The computer device according to claim 11, wherein the first obtaining unit is specifically configured to obtain the first description text from literature and/or network information describing the first software.
13. The computer device according to claim 10 or 11, wherein the second obtaining unit is specifically configured to obtain, according to the metadata and the code dependent data of the n pieces of second software, corresponding classifications of the n pieces of second software;
and/or the presence of a gas in the gas,
the second obtaining unit is specifically configured to obtain n classifications corresponding to the second software according to a first preset entry, where the first preset entry includes n classifications corresponding to the second software.
14. The computer device according to claim 10 or 11, wherein the third obtaining unit is specifically configured to obtain n second feature vectors corresponding to the second software according to n second description texts, where the second description texts are texts describing attributes of the second software;
and/or the presence of a gas in the gas,
the third obtaining unit is specifically configured to obtain n second feature vectors corresponding to the second software according to a second preset entry, where the second preset entry includes n preset second feature vectors corresponding to the second software.
15. The computer device according to claim 14, wherein the third obtaining unit is specifically configured to obtain n second description texts corresponding to n second software, respectively;
the third obtaining unit is specifically configured to obtain, based on the n second description texts, the second features respectively corresponding to the n second software, where the number of the second features is greater than or equal to 1;
the third obtaining unit is specifically configured to obtain the second feature vectors based on the second features respectively corresponding to the n pieces of second software.
16. The computer device according to claim 15, wherein the third obtaining unit is specifically configured to obtain n second description texts from literature and/or network information that respectively describe the n second software.
17. A computer device according to any of claims 10-12, 15 or 16, wherein the artificial intelligence method comprises at least any of:
a proximity algorithm or a clustering algorithm.
18. The computer device of any of claims 10-12, wherein the attributes of the first software and the attributes of the second software each include at least one of:
the software comprises a function corresponding to the software, a publisher of the software, an application scenario of the software or iterative version information of the software, wherein the software comprises the first software and the second software.
19. A computer device, comprising: a memory, a transceiver, a processor, and a bus system;
wherein the memory is used for storing programs;
the processor is configured to execute a program in the memory to implement the method of any one of claims 1 to 9;
the bus system is used for connecting the memory and the processor so as to enable the memory and the processor to communicate.
20. A computer-readable storage medium comprising instructions that, when executed on a computer, cause the computer to perform the method of any of claims 1 to 9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111138645.8A CN113869408A (en) | 2021-09-27 | 2021-09-27 | Classification method and computer equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111138645.8A CN113869408A (en) | 2021-09-27 | 2021-09-27 | Classification method and computer equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113869408A true CN113869408A (en) | 2021-12-31 |
Family
ID=78991629
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111138645.8A Pending CN113869408A (en) | 2021-09-27 | 2021-09-27 | Classification method and computer equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113869408A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115065600A (en) * | 2022-06-13 | 2022-09-16 | 远景智能国际私人投资有限公司 | Equipment grouping method, device, equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001331514A (en) * | 2000-05-19 | 2001-11-30 | Ricoh Co Ltd | Device and method for document classification |
CN105956083A (en) * | 2016-04-29 | 2016-09-21 | 广州优视网络科技有限公司 | Application software classification system, application software classification method and server |
CN109886020A (en) * | 2019-01-24 | 2019-06-14 | 燕山大学 | Software vulnerability automatic classification method based on deep neural network |
CN111797239A (en) * | 2020-09-08 | 2020-10-20 | 中山大学深圳研究院 | Application program classification method and device and terminal equipment |
CN112861974A (en) * | 2021-02-08 | 2021-05-28 | 和美(深圳)信息技术股份有限公司 | Text classification method and device, electronic equipment and storage medium |
-
2021
- 2021-09-27 CN CN202111138645.8A patent/CN113869408A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001331514A (en) * | 2000-05-19 | 2001-11-30 | Ricoh Co Ltd | Device and method for document classification |
CN105956083A (en) * | 2016-04-29 | 2016-09-21 | 广州优视网络科技有限公司 | Application software classification system, application software classification method and server |
CN109886020A (en) * | 2019-01-24 | 2019-06-14 | 燕山大学 | Software vulnerability automatic classification method based on deep neural network |
CN111797239A (en) * | 2020-09-08 | 2020-10-20 | 中山大学深圳研究院 | Application program classification method and device and terminal equipment |
CN112861974A (en) * | 2021-02-08 | 2021-05-28 | 和美(深圳)信息技术股份有限公司 | Text classification method and device, electronic equipment and storage medium |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115065600A (en) * | 2022-06-13 | 2022-09-16 | 远景智能国际私人投资有限公司 | Equipment grouping method, device, equipment and storage medium |
CN115065600B (en) * | 2022-06-13 | 2024-01-05 | 远景智能国际私人投资有限公司 | Equipment grouping method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112035599B (en) | Query method and device based on vertical search, computer equipment and storage medium | |
CN110909160A (en) | Regular expression generation method, server and computer readable storage medium | |
CN112988784B (en) | Data query method, query statement generation method and device | |
CN110750297B (en) | Python code reference information generation method based on program analysis and text analysis | |
JP2014112283A (en) | Information processing device, information processing method, and program | |
KR20120047622A (en) | System and method for managing digital contents | |
WO2022262632A1 (en) | Webpage search method and apparatus, and storage medium | |
CN113609847A (en) | Information extraction method and device, electronic equipment and storage medium | |
CN110147223B (en) | Method, device and equipment for generating component library | |
CN114610955A (en) | Intelligent retrieval method and device, electronic equipment and storage medium | |
CN114492669A (en) | Keyword recommendation model training method, recommendation method and device, equipment and medium | |
CN113869408A (en) | Classification method and computer equipment | |
CN113139383A (en) | Document sorting method, system, electronic equipment and storage medium | |
CN112926297A (en) | Method, apparatus, device and storage medium for processing information | |
US20160170983A1 (en) | Information management apparatus and information management method | |
CN113886535B (en) | Knowledge graph-based question and answer method and device, storage medium and electronic equipment | |
CN113449063B (en) | Method and device for constructing document structure information retrieval library | |
CN112989011B (en) | Data query method, data query device and electronic equipment | |
CN116822491A (en) | Log analysis method and device, equipment and storage medium | |
CN111753199B (en) | User portrait construction method and device, electronic device and medium | |
CN111291208B (en) | Front-end page element naming method and device and electronic equipment | |
KR102062139B1 (en) | Method and Apparatus for Processing Data Based on Intelligent Data Structure | |
CN112182218A (en) | Text data classification method and device | |
CN113505889B (en) | Processing method and device of mapping knowledge base, computer equipment and storage medium | |
JP2015203960A (en) | partial information extraction system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |