CN105630975B - Information processing method and electronic equipment - Google Patents

Information processing method and electronic equipment Download PDF

Info

Publication number
CN105630975B
CN105630975B CN201510992322.3A CN201510992322A CN105630975B CN 105630975 B CN105630975 B CN 105630975B CN 201510992322 A CN201510992322 A CN 201510992322A CN 105630975 B CN105630975 B CN 105630975B
Authority
CN
China
Prior art keywords
description information
software
preset
search
keywords
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510992322.3A
Other languages
Chinese (zh)
Other versions
CN105630975A (en
Inventor
葛付江
赵凯
卢小东
卓雷
史晓斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lenovo Beijing Ltd
Original Assignee
Lenovo Beijing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lenovo Beijing Ltd filed Critical Lenovo Beijing Ltd
Priority to CN201510992322.3A priority Critical patent/CN105630975B/en
Publication of CN105630975A publication Critical patent/CN105630975A/en
Application granted granted Critical
Publication of CN105630975B publication Critical patent/CN105630975B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides an information processing method, which comprises the steps of searching a description information set related to software in a search engine based on the name of the software, and continuously extracting keywords from each description information in the description information set according to a preset extraction rule; and classifying the software based on the keywords to obtain the category of the software. By adopting the method, because the plurality of description information is obtained based on the software name search for extracting the keywords, the information contained in the description information is related to the software name and is not limited to the software name, the content of the classification basis is not only the software name but more information than the software name, therefore, the software is classified based on the extracted keywords, and the accuracy is higher.

Description

Information processing method and electronic equipment
Technical Field
The present invention relates to the field of electronic devices, and in particular, to an information processing method and an electronic device.
Background
With the development of electronic technology, various kinds of software have been widely developed and applied to various fields.
Software classification generally classifies software into industries (construction, government, finance, education, medical, legal, real estate, logistics, intermediaries, etc.), mail, documents, operating system tools, computer security, downloads, learning, chat, video, games, pictures, music, financing, etc. according to predefined categories.
In the prior art, software names are generally classified only based on the software names, but the software names are short, and many pieces of software are named by using common words or other similar images (for example, Chrome is a browser), so that the classification accuracy is low.
Disclosure of Invention
In view of this, the present invention provides an information processing method, which solves the problem of low accuracy caused by classifying software names only based on the software names in the prior art.
In order to achieve the purpose, the invention provides the following technical scheme:
an information processing method comprising:
searching a description information set related to the software in a search engine based on the name of the software;
extracting keywords from the description information set according to a preset extraction rule;
and classifying the software based on the keywords to obtain the category of the software.
In the above method, preferably, after the software-based name is searched in a search engine to obtain a description information set about the software, before the extracting a keyword from the description information set according to a preset extraction rule, the method further includes:
and sequencing at least two pieces of description information in the description information set according to a preset sequencing rule.
In the above method, preferably, if there are at least two search engines, the search engine obtains a description information set about the software based on the name of the software; sequencing at least two pieces of description information in the description information set according to a preset sequencing rule, wherein the sequencing comprises the following steps:
respectively searching in at least two search engines based on the names of the software to be classified to obtain search results corresponding to the search engines, wherein the search results corresponding to each search engine comprise at least one piece of description information about the software;
collecting the search results of each search engine to obtain a description information set related to the software, wherein the description information set comprises at least two pieces of description information;
analyzing the similarity value of the description information and the software name;
and sequencing at least two pieces of description information in the description information set according to the similarity value of the description information and a preset sequencing mode.
Preferably, the method, based on the keyword, of classifying the software to obtain the category of the software, includes:
analyzing to obtain a classification category corresponding to the keyword based on the keyword;
and establishing an incidence relation between the category and the description information to realize the classification of the software so as to obtain the category of the software.
Preferably, the establishing of the association relationship between the category and the description information includes:
based on the sequencing order of the at least two pieces of description information, assigning a weight value to the keywords corresponding to the description information;
analyzing to obtain an association relation value between the category and the description information based on the keyword weight value corresponding to the description information;
and respectively establishing the association relationship between the software corresponding to the at least two pieces of description information and the category according to the association relationship value.
In the above method, preferably, after the software-based name is searched in a search engine to obtain a description information set about the software, before the extracting a keyword from the description information set according to a preset extraction rule, the method further includes:
and filtering at least two pieces of description information in the description information set to obtain the description information meeting preset filtering conditions.
In the above method, preferably, the filtering at least two pieces of description information in the description information set includes:
acquiring initial keywords contained in the description information;
judging whether the initial keyword is larger than a preset first threshold value or not to obtain a first judgment result;
and representing that the number of the initial keywords is larger than a first threshold value based on the first judgment result, and judging that the description information meets a preset filtering condition.
In the above method, preferably, the filtering at least two pieces of description information in the description information set includes:
acquiring initial keywords contained in the description information;
analyzing to obtain the classification number corresponding to the description information based on the corresponding relation between the keywords and the classification categories and the initial keywords;
judging whether the classification number is smaller than a preset second threshold value or not to obtain a second judgment result;
and representing that the classification number is smaller than a preset second threshold value based on the second judgment result, and judging that the description information meets a preset filtering condition.
An electronic device, comprising:
a memory;
the processor is used for searching a description information set related to the software in a search engine based on the name of the software; extracting keywords from the description information set according to a preset extraction rule; and classifying the software based on the keywords to obtain the category of the software.
In the above electronic device, preferably, after the processor searches for the description information set about the software in the search engine based on the name of the software, the processor is further configured to:
and sequencing at least two pieces of description information in the description information set according to a preset sequencing rule.
In the electronic device, it is preferable that, when the number of the search engines is at least two,
the processor searches a search engine for a description information set related to the software based on the name of the software; sequencing at least two pieces of description information in the description information set according to a preset sequencing rule, specifically comprising:
respectively searching in at least two search engines based on the names of the software to be classified to obtain search results corresponding to the search engines, wherein the search results corresponding to each search engine comprise at least one piece of description information about the software;
collecting the search results of each search engine to obtain a description information set related to the software, wherein the description information set comprises at least two pieces of description information;
analyzing the similarity value of the description information and the software name;
and sequencing at least two pieces of description information in the description information set according to the similarity value of the description information and a preset sequencing mode.
An electronic device, comprising:
the acquisition module is used for searching in a search engine to obtain a description information set related to the software based on the name of the software;
the extraction module is used for extracting keywords from the description information set according to a preset extraction rule;
and the classification module is used for classifying the software based on the keywords to obtain the category of the software.
According to the technical scheme, compared with the prior art, the invention provides the information processing method, the description information set of the software is searched and obtained in the search engine based on the name of the software, and the keywords are continuously extracted from each description information in the description information set according to the preset extraction rule; and classifying the software based on the keywords to obtain the category of the software. By adopting the method, because the plurality of description information is obtained based on the software name search for extracting the keywords, the information contained in the description information is related to the software name and is not limited to the software name, the content of the classification basis is not only the software name but more information than the software name, therefore, the software is classified based on the extracted keywords, and the accuracy is higher.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a flowchart of an information processing method according to embodiment 1 of the present invention;
fig. 2 is a flowchart of an information processing method according to embodiment 2 of the present invention;
fig. 3 is a flowchart of an information processing method according to embodiment 3 of the present invention;
fig. 4 is a flowchart of an information processing method according to embodiment 4 of the present invention;
fig. 5 is a flowchart of an embodiment 5 of an information processing method according to the present invention;
fig. 6 is a flowchart of an embodiment 6 of an information processing method according to the present invention;
fig. 7 is a flowchart of an embodiment 7 of an information processing method according to the present invention;
fig. 8 is a flowchart of an embodiment 8 of an information processing method according to the present invention;
fig. 9 is a schematic structural diagram of an electronic device in embodiment 1 of the present invention;
fig. 10 is a schematic structural diagram of an electronic device in embodiment 2 of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, a flowchart of an embodiment 1 of an information processing method according to the present invention is shown, where the method is applied to an electronic device, where the electronic device may be an electronic device in the form of a desktop, a notebook, a tablet computer, a mobile phone, a smart television, a smart watch, a wearable device, or the like.
Wherein, the method comprises the following steps:
step S101: searching a description information set related to the software in a search engine based on the name of the software;
the software name to be classified is known, and the name of the software is searched in a search engine to obtain a large amount of description information about the software, wherein the large amount of description information forms a description information set about the software.
The number of the search engines can be one or more.
In specific implementation, in order to improve the classification precision, a plurality of search engines can be adopted for searching, and in order to reduce the data volume searched in the classification process, one search engine can be adopted for searching.
Step S102: extracting keywords from the description information set according to a preset extraction rule;
the electronic device is also preset with an extraction rule, and keywords of each description information in the description information set can be extracted based on the extraction rule.
It should be noted that the description information includes an information name and an information abstract, and the keyword may be a keyword extracted from the information name and/or the information abstract of the description information.
Step S103: and classifying the software based on the keywords to obtain the category of the software.
It should be noted that, according to the existing classification rule, different keywords correspond to different classifications.
Then, the software can be classified based on the keyword, and since the software may have a plurality of keywords, it can be determined that the software belongs to a plurality of categories when the software is classified.
For example, the category to which a piece of software belongs may be two or more categories, such as games, military, and so on.
In summary, in the information processing method provided in this embodiment, based on the name of software, a description information set related to the software is obtained by searching in a search engine, and according to a preset extraction rule, a keyword is continuously extracted from each description information in the description information set; and classifying the software based on the keywords to obtain the category of the software. By adopting the method, because the plurality of description information is obtained based on the software name search for extracting the keywords, the information contained in the description information is related to the software name and is not limited to the software name, the content of the classification basis is not only the software name but more information than the software name, therefore, the software is classified based on the extracted keywords, and the accuracy is higher.
Referring to fig. 2, a flowchart of an embodiment 2 of an information processing method according to the present invention is shown, where the method includes the following steps:
step S201: searching a description information set related to the software in a search engine based on the name of the software;
step S201 is the same as step S101 in embodiment 1, and details are not described in this embodiment.
Step S202: sequencing at least two pieces of description information in the description information set according to a preset sequencing rule;
the electronic device is preset with a sorting rule, and sorting of each description information in the description information set can be realized based on the sorting rule.
It should be noted that, in the present application, the similarity between the description information and the software name is used for sorting, and a specific process of sorting based on the similarity will be explained in the following embodiments, which is not described in detail in this embodiment.
Step S203: extracting keywords from the description information set according to a preset extraction rule;
step S204: and classifying the software based on the keywords to obtain the category of the software.
Steps S203 to 204 are the same as steps S102 to 103 in embodiment 1, and are not described in detail in this embodiment.
In summary, the information processing method provided in this embodiment further includes: and sequencing at least two pieces of description information in the description information set according to a preset sequencing rule. By adopting the method, keywords are extracted from the description information set based on the result of sequencing the plurality of description information in the description information set.
Wherein, the number of the search engines can be multiple.
Referring to fig. 3, a flowchart of an embodiment 3 of an information processing method according to the present invention is shown, where the method includes the following steps:
step S301: respectively searching in at least two search engines based on the software name to be classified to obtain search results corresponding to the search engines;
step S302: collecting the search results of each search engine to obtain a description information set related to the software;
and the search result corresponding to each search engine comprises at least one piece of description information about the software.
It should be noted that, if the search modes (algorithms adopted for search) corresponding to different search engines are different, the description information obtained by the search is also not completely the same.
Specifically, the name of the software can be searched in different search engines respectively, and search results obtained by searching of the search engines are obtained, wherein the search structure of each search engine comprises at least one piece of description information, and the description information is related to the software.
The method comprises the steps of collecting search results obtained by searching of each search engine to obtain all description information sets related to the software, wherein the description information sets comprise at least two pieces of description information.
Step S303: analyzing the similarity value of the description information and the software name;
the description information comprises an information name and an information abstract.
Specifically, based on the information name and the information abstract in the description information, the similarity with the software name is analyzed.
In specific implementation, the information name and the information abstract in the description information can be obtained by calculation through an Euclidean distance algorithm, and the similarity value between the information name and the software name can be obtained by calculation through the information abstract.
Specifically, for two n-dimensional vectors X ═ X (X)1,x2,...,xi,...,xn) And Y ═ Y1,y2,...,yi,...,yn) Wherein x isiAnd yiAre all real number domain values, i.e. xi∈R,yie.R, R is a real number field.
The euclidean distance d between X and Y is calculated as follows:
Figure BDA0000889266980000081
a larger d (X, Y) indicates a greater distance between vectors X and Y, and a lower similarity.
As a specific example, the Euclidean distance is applied to calculate two documents D1And D2The method comprises the following steps:
let D1And D2All different vocabulary sets present in (a) are W ═ W1,w2,...,wi,...,wn) Wherein w isiA certain word, then D1And D2The feature that can distribute the vocabulary appearing therein is denoted F1And F2
F1=(f(w1,D1),f(w2,D1),...,f(wi,D1),...,f(wn,D1))
F2=(f(w1,D2),f(w2,D2),...,f(wi,D2),...,f(wn,D2))
Wherein, f (w)i,Dj) The expression vocabulary wiIn document DjCharacteristic value of (1), f (w)i,Dj) Two common calculation methods are:
1. word frequency: f (w)i,Dj) Is wiIn document DjThe frequency of occurrence of (1) is denoted as f (w)i,Dj)=tf(wi,Dj);
2、TF-IDF(Term Frequency-Inverse Document Frequency):
f(wi,Dj)=tf(wi,Dj)×idf(wi);
Wherein,
Figure BDA0000889266980000082
wherein D represents the set of all documents, | D | represents the number of documents in the set of all documents, j: wi∈DjIndicating that the word w appears in the collection D of all documentsiThe number of documents.
Wherein, f (w)i,Dj) Taking any one of the above, then D1And D2The Euclidean distance of (1) is:
Figure BDA0000889266980000083
it should be noted that, since the description information set includes a plurality of description information, similarity values between the plurality of description information and the software name need to be calculated sequentially.
Step S304: sorting at least two pieces of description information in the description information set according to the similarity value of the description information and a preset sorting mode;
and sorting each description information in the description information set based on the similarity value between the description information obtained by the calculation in the step and the software name.
Specifically, the description information in the description information set may be sorted according to a preset sorting manner.
For example, the description information may be sorted in a sorting manner from a large similarity value to a small similarity value, and then in the subsequent steps, keywords may be extracted based on the sequence from front to back; or the description information may be sorted in a sorting manner from small to large similarity values, and in the subsequent steps, the keywords may be extracted based on the order from back to front.
Step S305: extracting keywords from the description information set according to a preset extraction rule;
step S306: and classifying the software based on the keywords to obtain the category of the software.
Steps S305 to 306 are the same as steps S203 to 204 in embodiment 2, and are not described in detail in this embodiment.
In summary, the information processing method provided in this embodiment includes: respectively searching in at least two search engines based on the names of the software to be classified to obtain search results corresponding to the search engines, wherein the search results corresponding to each search engine comprise at least one piece of description information about the software; collecting the search results of each search engine to obtain a description information set related to the software, wherein the description information set comprises at least two pieces of description information; analyzing the similarity value of the description information and the software name; and sequencing at least two pieces of description information in the description information set according to the similarity value of the description information and a preset sequencing mode. By adopting the method, the similarity value of each description information and the software name is obtained through analysis based on a description information set consisting of a plurality of description information obtained through searching of each search engine, and the similarity value is sequenced to prepare for extracting the key words subsequently.
Referring to fig. 4, a flowchart of an embodiment 4 of an information processing method according to the present invention is shown, where the method includes the following steps:
step S401: respectively searching in at least two search engines based on the software name to be classified to obtain search results corresponding to the search engines;
step S402: collecting the search results of each search engine to obtain a description information set related to the software, wherein the description information set comprises at least two pieces of description information;
step S403: analyzing the similarity value of the description information and the software name;
step S404: sorting at least two pieces of description information in the description information set according to the similarity value of the description information and a preset sorting mode;
step S405: extracting keywords from the description information set according to a preset extraction rule;
steps S401 to 406 are the same as steps S301 to 306 in embodiment 3, and are not described in detail in this embodiment.
Step S406: analyzing to obtain a classification category corresponding to the keyword based on the keyword;
first, correspondence between keywords and classification categories is preset.
Then, according to the corresponding relationship, the classification category corresponding to the keyword can be analyzed and found.
For example, the classification category corresponding to the keyword "game" is game; the classification category corresponding to the keyword "war" is military.
Step S407: and establishing an incidence relation between the category and the description information to realize the classification of the software so as to obtain the category of the software.
Since the keyword is extracted based on the description information, the classification category determined based on the keyword also has an association relationship with the description information.
Specifically, the association relationship between the category and the description information is established, so that the description information corresponds to the corresponding category, and the description information is related to the software, that is, the correspondence relationship between the software and the category is established, so that the software classification is realized, and the category corresponding to the description information is the category to which the software belongs.
In summary, in the information processing method provided in this embodiment, the classifying the software based on the keyword to obtain the category to which the software belongs includes: analyzing to obtain a classification category corresponding to the keyword based on the keyword; and establishing an incidence relation between the category and the description information to realize the classification of the software so as to obtain the category of the software. By adopting the method, the category corresponding to the description information is determined based on the keyword, so that the corresponding category of the software is determined, and the process of classifying the software is completed.
Referring to fig. 5, a flowchart of an embodiment 5 of an information processing method according to the present invention is shown, where the method includes the following steps:
step S501: respectively searching in at least two search engines based on the software name to be classified to obtain search results corresponding to the search engines;
step S502: collecting the search results of each search engine to obtain a description information set related to the software, wherein the description information set comprises at least two pieces of description information;
step S503: analyzing the similarity value of the description information and the software name;
step S504: sorting at least two pieces of description information in the description information set according to the similarity value of the description information and a preset sorting mode;
step S505: extracting keywords from the description information set according to a preset extraction rule;
step S506: analyzing to obtain a classification category corresponding to the keyword based on the keyword;
steps S501 to 506 are the same as steps S401 to 406 in embodiment 4, and are not described in detail in this embodiment.
Step S507: based on the sequencing order of the at least two pieces of description information, assigning a weight value to the keywords corresponding to the description information;
and giving a weight value to the keywords acquired from the description information based on the sequencing order of the description information.
Because the order of the description information is based on the similarity value between the description information and the software name, the greater the similarity value is, the greater the association relationship between the corresponding keyword and the software name is.
Specifically, when the ranking is performed according to the similarity value from large to small, the keywords obtained from the description information with the higher ranking are given with larger weight values; when the ranking is performed from small to large according to the similarity value, the keywords acquired from the description information at the back of the ranking are endowed with larger weight values.
For example, the description information is description information 1, description information 2 and description information 3 in sequence from large to small according to the similarity value; the keywords obtained from the description information 1 include a, b, and c, the keywords obtained from the description information 2 include a, c, and e, and the keywords obtained from the description information 3 include a, s, and d. Then the value of the keyword corresponding to the description information 1 can be assigned 0.5, the value of the keyword corresponding to the description information 2 can be assigned 0.3, and the value of the keyword corresponding to the description information 1 can be assigned 0.2.
Step S508: analyzing to obtain an association relation value between the category and the description information based on the keyword weight value corresponding to the description information;
and calculating the association relation value between the description information and the category corresponding to the keyword based on the weight value of the keyword corresponding to each description information.
It should be noted that, if the keyword is extracted from the description information, the value of the association relationship between the category and the description information is proportional to or the same as the weight value of the keyword of the description information.
Wherein the correlation value characterizes the degree of correlation between the description information and the category.
Specifically, when the correlation value is large, the correlation degree between the description information and the category is characterized to be high; when the association relation value is smaller, the description information is characterized to be associated with the category to a lower degree.
Step S509: respectively establishing the association relationship between the software corresponding to the at least two pieces of description information and the category according to the association relationship value;
firstly, the association relation value characterizes the association degree between the description information and the category, and the association relation between the description information and the software can be calculated based on the association relation value characterizing the association degree between the description information and the category.
For example, the description information is description information 1, description information 2 and description information 3 in sequence from large to small according to the similarity value; the keywords obtained from the description information 1 include a, b, and c, the keywords obtained from the description information 2 include a, c, and e, and the keywords obtained from the description information 3 include a, s, and d. Then the value of the keyword corresponding to the description information 1 can be assigned 0.5, the value of the keyword corresponding to the description information 2 can be assigned 0.3, and the value of the keyword corresponding to the description information 1 can be assigned 0.2. The category corresponding to the keyword a is A, and the category corresponding to the keyword B is B.
For the category a, the association relationship between the software and the category can be expressed as: 0.5+0.3+0.2 ═ 1;
for the category B, the association relationship between the software and the category can be expressed as: 0.5+0+0 ═ 0.5.
And calculating the association relationship between other categories and the software in turn.
Step S510: and classifying the software based on the incidence relation to obtain the class of the software.
If the at least two pieces of description information have an association relationship with the category, it can be determined that the software also has an association relationship with the category, that is, the software belongs to the category.
Specifically, a certain category may correspond to a plurality of pieces of description information, and it may be determined that the software belongs to the category based on an association relationship between the description information and the category.
It should be noted that, because the description information has an association value with the category, the association degree between each piece of description information related to the software and a certain category can be obtained through analysis based on the association value, and the association degree characterizes the association degree between the software and the category.
Further, the software may determine the category associated with the software based on the degree of association between the software and the category, and may rank the categories to which the software belongs based on the degree of association.
For example, if the association between the software and the category A is 1, and the association between the software and the category B is 0.5, the software can be sorted according to the value of the association.
In summary, in the information processing method provided in this embodiment, the establishing of the association relationship between the category and the description information includes: based on the sequencing order of the at least two pieces of description information, assigning a weight value to the keywords corresponding to the description information; analyzing to obtain an association relation value between the category and the description information based on the keyword weight value corresponding to the description information; and respectively establishing the association relationship between the software corresponding to the at least two pieces of description information and the category according to the association relationship value. By adopting the method, the association relationship between the description information and the category is quantized, the association relationship value between the description information and the category is obtained through calculation, and the relationship between the corresponding category and the software is further determined based on the association relationship value, so that the software classification process is realized.
Referring to fig. 6, a flowchart of an embodiment 6 of an information processing method according to the present invention is shown, where the method includes the following steps:
step S601: searching a description information set related to the software in a search engine based on the name of the software;
step S601 is the same as step S101 in embodiment 1, and details are not described in this embodiment.
Step S602: filtering at least two pieces of description information in the description information set to obtain description information meeting preset filtering conditions;
when the search engine searches for the name of the software, massive description information can be obtained, and the massive description information needs to be filtered in order to reduce the data processing amount in the classification process.
It should be noted that the specific process of the filtering will be described in detail in the following examples, which are not described in detail in this example.
Step S603: extracting keywords from the description information set according to a preset extraction rule;
it should be noted that the description information in step S603 refers to the description information that satisfies the preset filtering condition.
Step S604: and classifying the software based on the keywords to obtain the category of the software.
Steps S603 to 604 are the same as steps S102 to 103 in embodiment 1, and are not described in detail in this embodiment.
In summary, the information processing method provided in this embodiment further includes: and filtering at least two pieces of description information in the description information set to obtain the description information meeting preset filtering conditions. By adopting the method, massive description information obtained by searching is filtered, so that the data processing amount in the classification process is reduced, and the data processing load of the electronic equipment is reduced.
Referring to fig. 7, a flowchart of an embodiment 7 of an information processing method according to the present invention is shown, where the method includes the following steps:
step S701: searching a description information set related to the software in a search engine based on the name of the software;
step S701 is the same as step S601 in embodiment 6, and details are not described in this embodiment.
Step S702: acquiring initial keywords contained in the description information;
the description information may include a document name and a document abstract.
Generally, the initial keyword is set by an editor when the description information is generated. That is, the initial keyword contained in the description information is determined by directly reading the description information.
Specifically, the initial keyword included in the description information is obtained from the document name and the document abstract, and is not obtained from the document name only.
Step S703: judging whether the initial keyword is larger than a preset first threshold value or not to obtain a first judgment result;
the initial keywords represent the information amount contained in the description information, and the more the initial keywords represent the more the information amount contained in the description information, otherwise, the less the initial keywords represent the information amount contained in the description information.
In order to ensure that the description information according to the subsequent processing process has enough information content, it needs to be determined that the description information contains the initial keywords larger than a first preset threshold, so that the description information is judged to meet the preset filtering condition based on the fact that the number of the initial keywords is larger than the first threshold represented by the first judgment result.
Specifically, the first preset threshold may be one third of the number of keywords in the software name.
For example, the software name has 6 keywords a, b, c, d, e, and f, the description information 4 has initial keywords a, b, and c, the description information 5 has initial keywords a, b, and d, and the description information 6 has initial keyword e, it is known that the number of the initial keywords in the description information 6 is less than 1/3 of the software name keyword, and the description information 6 does not satisfy the preset filtering condition.
Step S704: extracting keywords from the description information set according to a preset extraction rule based on the fact that the description information meets a preset filtering condition;
step S705: and classifying the software based on the keywords to obtain the category of the software.
Steps S704-705 are the same as steps S603-604 in embodiment 6, and are not described in detail in this embodiment.
In summary, in the information processing method provided in this embodiment, the filtering at least two pieces of description information in the description information set includes: acquiring initial keywords contained in the description information; judging whether the initial keyword is larger than a preset first threshold value or not to obtain a first judgment result; and representing that the number of the initial keywords is larger than a first threshold value based on the first judgment result, and judging that the description information meets a preset filtering condition. By adopting the method, based on the fact that the initial keyword contained in the description information is larger than the preset first threshold value, the description information is judged to meet the preset filtering condition, and therefore the description information based on the subsequent processing process is guaranteed to have enough information quantity.
Referring to fig. 8, a flowchart of an embodiment 8 of an information processing method according to the present invention is shown, where the method includes the following steps:
step S801: searching a description information set related to the software in a search engine based on the name of the software;
step S801 is the same as step S601 in embodiment 6, and details are not described in this embodiment.
Step S802: acquiring initial keywords contained in the description information;
the description information may include a document name and a document abstract.
Generally, the initial keyword is set by an editor when the description information is generated. That is, the initial keyword contained in the description information is determined by directly reading the description information.
Specifically, the initial keyword included in the description information is obtained from the document name and the document abstract, and is not obtained from the document name only.
Step S803: analyzing to obtain the classification number corresponding to the description information based on the corresponding relation between the keywords and the classification categories and the initial keywords;
and determining the classification category corresponding to the initial keyword based on the initial keyword because the keyword and the classification category have a corresponding relation.
Because one description information contains a plurality of initial keywords, a plurality of classification categories corresponding to the description information can be determined.
Step S804: judging whether the classification number is smaller than a preset second threshold value or not to obtain a second judgment result;
it should be noted that generally, there are several categories corresponding to one piece of description information, if there are too many corresponding categories, there is a problem in the initial keyword setting manner, there are interference factors in the description information, and the description information can be ignored in order to reduce the difficulty of subsequent processing.
Specifically, the number of the classifications is represented to be smaller than a preset second threshold value based on the second judgment result, and it is judged that the description information meets a preset filtering condition.
Specifically, the second preset threshold is generally 6, that is, no more than 5 categories are considered to be classified normally, otherwise too many categories are classified.
Step S805: extracting keywords from the description information set according to a preset extraction rule based on the fact that the description information meets a preset filtering condition;
step S806: and classifying the software based on the keywords to obtain the category of the software.
Steps S805 to 806 are the same as steps S603 to 604 in embodiment 6, and are not described in detail in this embodiment.
In summary, in the information processing method provided in this embodiment, the filtering at least two pieces of description information in the description information set includes: acquiring initial keywords contained in the description information; analyzing to obtain the classification number corresponding to the description information based on the corresponding relation between the keywords and the classification categories and the initial keywords; judging whether the classification number is smaller than a preset second threshold value or not to obtain a second judgment result; and representing that the classification number is smaller than a preset second threshold value based on the second judgment result, and judging that the description information meets a preset filtering condition. By adopting the method, the description information is judged to meet the preset filtering condition based on the fact that the corresponding classification of the initial keywords contained in the description information is smaller than the preset second threshold value, so that the description information based on the subsequent processing process is not influenced by interference factors, and the information processing difficulty is reduced.
The above embodiments provided by the present invention describe an information processing method in detail, and the method of the present invention can be implemented by various types of devices, so the present invention also provides an electronic device using the information processing method, and the following specific embodiments are given for detailed description.
Referring to fig. 9, a schematic structural diagram of an embodiment 1 of an electronic device provided in the present invention is shown, where the electronic device may be an electronic device in the form of a desktop, a notebook, a tablet computer, a mobile phone, a smart television, a smart watch, a wearable device, and the like.
Wherein, this electronic equipment can include the following part: a memory 901 and a processor 902;
the processor 902 is configured to search a search engine for a description information set about software based on a name of the software; extracting keywords from the description information set according to a preset extraction rule; and classifying the software based on the keywords to obtain the category of the software.
The memory 901 is used to store various contents, such as preset extraction rules, classification results, and the like.
Preferably, the processor, after searching the set of description information about the software in the search engine based on the name of the software, is further configured to:
and sequencing at least two pieces of description information in the description information set according to a preset sequencing rule.
Preferably, when the number of the search engines is at least two,
the processor searches a search engine for a description information set related to the software based on the name of the software; sequencing at least two pieces of description information in the description information set according to a preset sequencing rule, specifically comprising:
respectively searching in at least two search engines based on the names of the software to be classified to obtain search results corresponding to the search engines, wherein the search results corresponding to each search engine comprise at least one piece of description information about the software;
collecting the search results of each search engine to obtain a description information set related to the software, wherein the description information set comprises at least two pieces of description information;
analyzing the similarity value of the description information and the software name;
and sequencing at least two pieces of description information in the description information set according to the similarity value of the description information and a preset sequencing mode.
Preferably, the processor classifies the software based on the keyword to obtain the category of the software, and includes:
analyzing to obtain a classification category corresponding to the keyword based on the keyword;
and establishing an incidence relation between the category and the description information to realize the classification of the software so as to obtain the category of the software.
Preferably, the processor establishes an association relationship between the category and the description information, including:
based on the sequencing order of the at least two pieces of description information, assigning a weight value to the keywords corresponding to the description information;
analyzing to obtain an association relation value between the category and the description information based on the keyword weight value corresponding to the description information;
and respectively establishing the association relationship between the software corresponding to the at least two pieces of description information and the category according to the association relationship value.
Preferably, after the software-based name is searched in a search engine to obtain a description information set about the software, and before the processor extracts a keyword from the description information set according to a preset extraction rule, the processor is further configured to:
and filtering at least two pieces of description information in the description information set to obtain the description information meeting preset filtering conditions.
Preferably, the processor filters at least two pieces of description information in the description information set, and includes:
acquiring initial keywords contained in the description information;
judging whether the initial keyword is larger than a preset first threshold value or not to obtain a first judgment result;
and representing that the number of the initial keywords is larger than a first threshold value based on the first judgment result, and judging that the description information meets a preset filtering condition.
Preferably, the processor filters at least two pieces of description information in the description information set, and includes:
acquiring initial keywords contained in the description information;
analyzing to obtain the classification number corresponding to the description information based on the corresponding relation between the keywords and the classification categories and the initial keywords;
judging whether the classification number is smaller than a preset second threshold value or not to obtain a second judgment result;
and representing that the classification number is smaller than a preset second threshold value based on the second judgment result, and judging that the description information meets a preset filtering condition.
In summary, in the electronic device provided in this embodiment, based on the name of software, a description information set related to the software is obtained by searching in a search engine, and according to a preset extraction rule, a keyword is continuously extracted from each description information in the description information set; and classifying the software based on the keywords to obtain the category of the software. By adopting the electronic equipment, because the plurality of pieces of description information are obtained by searching based on the software name for keyword extraction, the information contained in the description information is related to the software name and is not limited to the software name, the content of the classification basis is not only the software name but also is more than the information of the software name, therefore, the software is classified based on the extracted keywords, and the accuracy is higher.
Referring to fig. 10, a schematic structural diagram of an embodiment 2 of an electronic device provided in the present invention is shown, where the electronic device may be an electronic device in the form of a desktop, a notebook, a tablet computer, a mobile phone, a smart television, a smart watch, a wearable device, and the like.
Wherein, this electronic equipment can include the following part: an acquisition module 1001, an extraction module 1002 and a classification module 1003;
the obtaining module 1001 is configured to search a search engine to obtain a description information set about software based on a name of the software;
the extracting module 1002 is configured to extract a keyword from the description information set according to a preset extracting rule;
the classification module 1003 is configured to classify the software based on the keyword to obtain a category to which the software belongs.
Preferably, the method further comprises the following steps: and the sequencing module is used for sequencing at least two pieces of description information in the description information set according to a preset sequencing rule.
Preferably, if there are at least two search engines, the obtaining module searches in the search engine to obtain a description information set about the software based on the name of the software; the sorting module sorts at least two pieces of description information in the description information set according to a preset sorting rule, and specifically includes:
respectively searching in at least two search engines based on the names of the software to be classified to obtain search results corresponding to the search engines, wherein the search results corresponding to each search engine comprise at least one piece of description information about the software;
collecting the search results of each search engine to obtain a description information set related to the software, wherein the description information set comprises at least two pieces of description information;
analyzing the similarity value of the description information and the software name;
and sequencing at least two pieces of description information in the description information set according to the similarity value of the description information and a preset sequencing mode.
Preferably, the classifying the software based on the keyword to obtain the category of the software includes:
analyzing to obtain a classification category corresponding to the keyword based on the keyword;
and establishing an incidence relation between the category and the description information to realize the classification of the software so as to obtain the category of the software.
Preferably, the establishing of the association relationship between the category and the description information includes:
based on the sequencing order of the at least two pieces of description information, assigning a weight value to the keywords corresponding to the description information;
analyzing to obtain an association relation value between the category and the description information based on the keyword weight value corresponding to the description information;
and respectively establishing the association relationship between the software corresponding to the at least two pieces of description information and the category according to the association relationship value.
Preferably, the method further comprises the following steps:
and the filtering module is used for filtering at least two pieces of description information in the description information set to obtain the description information meeting the preset filtering condition.
Preferably, the filtration module is specifically configured to:
acquiring initial keywords contained in the description information;
judging whether the initial keyword is larger than a preset first threshold value or not to obtain a first judgment result;
and representing that the number of the initial keywords is larger than a first threshold value based on the first judgment result, and judging that the description information meets a preset filtering condition.
Preferably, the filtration module is specifically configured to:
acquiring initial keywords contained in the description information;
analyzing to obtain the classification number corresponding to the description information based on the corresponding relation between the keywords and the classification categories and the initial keywords;
judging whether the classification number is smaller than a preset second threshold value or not to obtain a second judgment result;
and representing that the classification number is smaller than a preset second threshold value based on the second judgment result, and judging that the description information meets a preset filtering condition.
In summary, in the electronic device provided in this embodiment, based on the name of software, a description information set related to the software is obtained by searching in a search engine, and according to a preset extraction rule, a keyword is continuously extracted from each description information in the description information set; and classifying the software based on the keywords to obtain the category of the software. By adopting the electronic equipment, because the plurality of pieces of description information are obtained by searching based on the software name for keyword extraction, the information contained in the description information is related to the software name and is not limited to the software name, the content of the classification basis is not only the software name but also is more than the information of the software name, therefore, the software is classified based on the extracted keywords, and the accuracy is higher.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the device provided by the embodiment, the description is relatively simple because the device corresponds to the method provided by the embodiment, and the relevant points can be referred to the method part for description.
The previous description of the provided embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features provided herein.

Claims (8)

1. An information processing method characterized by comprising:
searching a description information set related to the software in a search engine based on the name of the software, wherein the description information comprises an information name and an information abstract;
sequencing at least two pieces of description information in the description information set according to a preset sequencing rule;
extracting keywords from the description information set according to a preset extraction rule;
classifying the software based on the keywords to obtain the category of the software;
when at least two search engines are provided, the description information set related to the software is searched in the search engines based on the names of the software; sequencing at least two pieces of description information in the description information set according to a preset sequencing rule, wherein the sequencing comprises the following steps:
respectively searching in at least two search engines based on the names of the software to be classified to obtain search results corresponding to the search engines, wherein the search results corresponding to each search engine comprise at least one piece of description information about the software, the search modes corresponding to different search engines are different, the search modes are algorithms adopted for searching, and the description information obtained by searching different engines is not completely the same;
collecting the search results of each search engine to obtain a description information set related to the software, wherein the description information set comprises at least two pieces of description information;
analyzing the similarity value of the description information and the software name;
and sequencing at least two pieces of description information in the description information set according to the similarity value of the description information and a preset sequencing mode.
2. The method of claim 1, wherein classifying the software based on the keywords to obtain the category to which the software belongs comprises:
analyzing to obtain a classification category corresponding to the keyword based on the keyword;
and establishing an incidence relation between the category and the description information to realize the classification of the software so as to obtain the category of the software.
3. The method of claim 2, wherein the establishing the association between the category and the description information comprises:
based on the sequencing order of the at least two pieces of description information, assigning a weight value to the keywords corresponding to the description information;
analyzing to obtain an association relation value between the category and the description information based on the keyword weight value corresponding to the description information;
and respectively establishing the association relationship between the software corresponding to the at least two pieces of description information and the category according to the association relationship value.
4. The method according to claim 1, wherein after the searching in a search engine for a description information set about the software based on the name of the software, before the extracting keywords from the description information set according to a preset extraction rule, the method further comprises:
and filtering at least two pieces of description information in the description information set to obtain the description information meeting preset filtering conditions.
5. The method of claim 4, wherein the filtering at least two descriptors in the set of descriptors comprises:
acquiring initial keywords contained in the description information;
judging whether the initial keyword is larger than a preset first threshold value or not to obtain a first judgment result;
and representing that the number of the initial keywords is larger than a first threshold value based on the first judgment result, and judging that the description information meets a preset filtering condition.
6. The method of claim 4, wherein the filtering at least two descriptors in the set of descriptors comprises:
acquiring initial keywords contained in the description information;
analyzing to obtain the classification number corresponding to the description information based on the corresponding relation between the keywords and the classification categories and the initial keywords;
judging whether the classification number is smaller than a preset second threshold value or not to obtain a second judgment result;
and representing that the classification number is smaller than a preset second threshold value based on the second judgment result, and judging that the description information meets a preset filtering condition.
7. An electronic device, comprising:
a memory;
the processor is used for searching a description information set related to the software in a search engine based on the name of the software; sequencing at least two pieces of description information in the description information set according to a preset sequencing rule; extracting keywords from the description information set according to a preset extraction rule; classifying the software based on the keywords to obtain the category of the software, wherein the description information comprises an information name and an information abstract;
when the number of the search engines is at least two, the processor searches the search engine for a description information set related to the software based on the name of the software; sequencing at least two pieces of description information in the description information set according to a preset sequencing rule, specifically comprising:
respectively searching in at least two search engines based on the names of the software to be classified to obtain search results corresponding to the search engines, wherein the search results corresponding to each search engine comprise at least one piece of description information about the software, the search modes corresponding to different search engines are different, the search modes are algorithms adopted for searching, and the description information obtained by searching different engines is not completely the same;
collecting the search results of each search engine to obtain a description information set related to the software, wherein the description information set comprises at least two pieces of description information;
analyzing the similarity value of the description information and the software name;
and sequencing at least two pieces of description information in the description information set according to the similarity value of the description information and a preset sequencing mode.
8. An electronic device, comprising:
the acquisition module is used for searching and obtaining a description information set related to the software in a search engine based on the name of the software, wherein the description information comprises an information name and an information abstract;
the sorting module is used for sorting at least two pieces of description information in the description information set according to a preset sorting rule;
the extraction module is used for extracting keywords from the description information set according to a preset extraction rule;
the classification module is used for classifying the software based on the keywords to obtain the category of the software;
if the number of the search engines is at least two, the acquisition module searches the search engines to obtain a description information set related to the software based on the name of the software; the sorting module sorts at least two pieces of description information in the description information set according to a preset sorting rule, and specifically includes:
respectively searching in at least two search engines based on the names of the software to be classified to obtain search results corresponding to the search engines, wherein the search results corresponding to each search engine comprise at least one piece of description information about the software, the search modes corresponding to different search engines are different, the search modes are algorithms adopted for searching, and the description information obtained by searching different engines is not completely the same;
collecting the search results of each search engine to obtain a description information set related to the software, wherein the description information set comprises at least two pieces of description information;
analyzing the similarity value of the description information and the software name;
and sequencing at least two pieces of description information in the description information set according to the similarity value of the description information and a preset sequencing mode.
CN201510992322.3A 2015-12-24 2015-12-24 Information processing method and electronic equipment Active CN105630975B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510992322.3A CN105630975B (en) 2015-12-24 2015-12-24 Information processing method and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510992322.3A CN105630975B (en) 2015-12-24 2015-12-24 Information processing method and electronic equipment

Publications (2)

Publication Number Publication Date
CN105630975A CN105630975A (en) 2016-06-01
CN105630975B true CN105630975B (en) 2020-10-27

Family

ID=56045908

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510992322.3A Active CN105630975B (en) 2015-12-24 2015-12-24 Information processing method and electronic equipment

Country Status (1)

Country Link
CN (1) CN105630975B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108090090A (en) * 2016-11-23 2018-05-29 北京国双科技有限公司 Programme orientation method and apparatus
CN108255522A (en) * 2016-12-27 2018-07-06 北京金山云网络技术有限公司 A kind of application program sorting technique and device
CN107861753B (en) * 2017-06-26 2020-12-11 平安普惠企业管理有限公司 APP generation index, retrieval method and system and readable storage medium
CN110941714A (en) * 2018-09-21 2020-03-31 武汉安天信息技术有限责任公司 Classification rule base construction method, application classification method and device
CN109753427B (en) * 2018-12-04 2023-05-23 国网山东省电力公司无棣县供电公司 Analysis system for power generation and supply test unit
CN111538874B (en) * 2020-04-22 2022-08-19 深圳传音控股股份有限公司 Quick search method, terminal and readable storage medium
CN112579476B (en) * 2021-02-23 2021-05-18 北京北大软件工程股份有限公司 Method and device for aligning vulnerability and software and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102081601A (en) * 2009-11-27 2011-06-01 北京金山软件有限公司 Field word identification method and device
CN102135992A (en) * 2011-03-15 2011-07-27 宇龙计算机通信科技(深圳)有限公司 Terminal application program classifying method and terminal
CN103577252A (en) * 2012-07-26 2014-02-12 腾讯科技(深圳)有限公司 Software sorting method and device
CN103984685A (en) * 2013-02-07 2014-08-13 百度国际科技(深圳)有限公司 Method, device and equipment for classifying items to be classified

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1324044A (en) * 2000-05-11 2001-11-28 贵州东方世纪科技有限责任公司 Search service supplying method and server in Internet
CN101340463B (en) * 2008-08-22 2012-04-25 深圳市迅雷网络技术有限公司 Method and apparatus for determining network resource type

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102081601A (en) * 2009-11-27 2011-06-01 北京金山软件有限公司 Field word identification method and device
CN102135992A (en) * 2011-03-15 2011-07-27 宇龙计算机通信科技(深圳)有限公司 Terminal application program classifying method and terminal
CN103577252A (en) * 2012-07-26 2014-02-12 腾讯科技(深圳)有限公司 Software sorting method and device
CN103984685A (en) * 2013-02-07 2014-08-13 百度国际科技(深圳)有限公司 Method, device and equipment for classifying items to be classified

Also Published As

Publication number Publication date
CN105630975A (en) 2016-06-01

Similar Documents

Publication Publication Date Title
CN105630975B (en) Information processing method and electronic equipment
CN106649818B (en) Application search intention identification method and device, application search method and server
WO2017045443A1 (en) Image retrieval method and system
CN111814770B (en) Content keyword extraction method of news video, terminal device and medium
CN111797239B (en) Application program classification method and device and terminal equipment
WO2020140373A1 (en) Intention recognition method, recognition device and computer-readable storage medium
CN109740152B (en) Text category determination method and device, storage medium and computer equipment
WO2017097231A1 (en) Topic processing method and device
US10482146B2 (en) Systems and methods for automatic customization of content filtering
US20160188633A1 (en) A method and apparatus for tracking microblog messages for relevancy to an entity identifiable by an associated text and an image
CN107515877A (en) The generation method and device of sensitive theme word set
US9256649B2 (en) Method and system of filtering and recommending documents
WO2014120835A1 (en) System and method for automatically classifying documents
CN107679070B (en) Intelligent reading recommendation method and device and electronic equipment
CN111310011A (en) Information pushing method and device, electronic equipment and storage medium
CN111125528A (en) Information recommendation method and device
CN108959329A (en) A kind of file classification method, device, medium and equipment
CN108021667A (en) A kind of file classification method and device
US10572525B2 (en) Determining an optimized summarizer architecture for a selected task
CN106815253B (en) Mining method based on mixed data type data
CN113656660A (en) Cross-modal data matching method, device, equipment and medium
JP6420268B2 (en) Image evaluation learning device, image evaluation device, image search device, image evaluation learning method, image evaluation method, image search method, and program
CN117493645B (en) Big data-based electronic archive recommendation system
CN108563713B (en) Keyword rule generation method and device and electronic equipment
CN113704623A (en) Data recommendation method, device, equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant