WO2017028566A1

WO2017028566A1 - Method and apparatus for collecting cloud environment resource focus point, and server

Info

Publication number: WO2017028566A1
Application number: PCT/CN2016/082253
Authority: WO
Inventors: 周莉
Original assignee: 中兴通讯股份有限公司
Priority date: 2015-08-19
Filing date: 2016-05-16
Publication date: 2017-02-23
Also published as: CN106470130A

Abstract

A method and an apparatus for collecting a cloud environment resource focus point, and a server. The method for collecting a cloud environment resource focus point comprises: summarizing words satisfying a first preset condition; calculating importance eigenvectors of the words satisfying the first preset condition; and obtaining a cloud environment resource focus point according to the importance eigenvectors, the importance eigenvector being used to indicate a weight of a corresponding word satisfying the first condition in a uniform resource locator (URL) webpage.

Description

Method, device and server for collecting cloud environment resource focus

Technical field

The present invention relates to, but is not limited to, the field of cloud computing resource technology.

Background technique

As is well known, the "cloud" in the related art is composed of virtual computing resources such as a computing server, a storage server, a bandwidth resource, a software, and an application that can self-maintain and manage, and the "cloud" is a resource pool. “Cloud computing” is a highly virtualized resource pool that is dynamically created by centralizing all computing resources. Therefore, how to obtain the operator’s attention to cloud environment resources (including physical resources and virtual resources), so that operators can use the cloud efficiently. Providing resources is a concern.

In the related art, a related algorithm for obtaining a focus of a cloud environment resource generally operates in a stand-alone mode.

Summary of the invention

The following is an overview of the topics detailed in this document. This Summary is not intended to limit the scope of the claims.

This paper provides a method, device and server for collecting cloud environment resource concerns, which can reduce the time-consuming of acquiring cloud environment resource concerns.

An embodiment of the present invention provides a method for collecting a focus of a cloud environment resource, including:

Collecting a vocabulary that satisfies the first preset condition;

Calculating an importance degree feature vector of the vocabulary that satisfies the first preset condition;

Obtaining a focus point of the cloud environment resource according to the importance degree feature vector;

Each of the importance degree feature vectors is used to represent a weight of each of the words corresponding to the first preset condition in a uniform resource locator URL webpage.

Optionally, the step of obtaining a cloud environment resource focus point according to the importance degree feature vector includes:

Performing data transformation on the importance degree feature vector to obtain a corresponding frequency;

Arranging the frequencies in order;

The frequency of satisfying the second preset condition after the arrangement is sequentially obtained;

According to the obtained frequency, the corresponding cloud environment resource attention point is obtained.

Optionally, the step of obtaining, by the summary, the vocabulary that satisfies the first preset condition comprises:

Summarizing the vocabulary that satisfies the third preset condition and its corresponding occurrence frequency in the resource-related URL webpage;

Sorting the words satisfying the third preset condition according to the frequency;

And sequentially obtaining the vocabulary that satisfies the third preset condition after the sorting, until the obtained frequency corresponding to the vocabulary satisfying the third preset condition reaches a frequency corresponding to all the vocabularies satisfying the third preset condition Set a threshold;

The obtained vocabulary that satisfies the third preset condition is saved as the vocabulary that satisfies the first preset condition.

Optionally, before the collecting the vocabulary that meets the first preset condition, the collecting method further includes:

Extract the resource-related URL from the sample log file;

Crawling the webpage content of the resource-related URL, and using the crawled webpage content as the text to be classified;

The text to be classified is segmented to obtain the vocabulary that satisfies the third preset condition.

Optionally, the step of performing the word segmentation of the to-be-classified text to obtain the vocabulary that meets the third preset condition comprises:

Sorting the text to be classified into a resource-related vocabulary;

Converting the resource-related vocabulary into a digital vector;

Processing the digital vector to obtain a parameter feature vector;

Obtaining a sequence parameter according to the parameter feature vector;

And obtaining the vocabulary that satisfies the third preset condition according to the order parameter.

Optionally, the step of obtaining the vocabulary that meets the third preset condition according to the order parameter includes:

The vocabulary satisfying the third preset condition is obtained by using the order parameter and the ranking algorithm.

Optionally, before the extracting the resource-related Uniform Resource Locator URL from the sample log file, the collecting method further includes:

Collect initial log files periodically;

The sample log file is obtained according to log data of the initial log file.

Optionally, the step of obtaining the sample log file according to the log data of the initial log file includes:

After receiving the information request sent by the network client according to the webpage open instruction, obtaining, according to the information request, information required to open the corresponding webpage from the initial log file;

The information required to open the corresponding web page is saved as the sample log file.

An embodiment of the present invention further provides a device for collecting a focus of a cloud environment resource, including:

a first processing module, configured to collectively obtain a vocabulary that satisfies a first preset condition;

a calculation module, configured to calculate an importance degree feature vector of the vocabulary that satisfies the first preset condition;

a second processing module, configured to obtain a cloud environment resource focus point according to the importance degree feature vector;

Optionally, the second processing module includes:

a transform submodule, configured to perform data transformation on the importance degree feature vector to obtain a corresponding frequency;

a first sorting submodule, configured to sequentially arrange the frequencies;

The first obtaining submodule is configured to sequentially acquire the frequency that satisfies the second preset condition after the arrangement;

The first processing submodule is configured to obtain a corresponding cloud environment resource concern point according to the obtained frequency.

Optionally, the first processing module includes:

a summary sub-module, configured to summarize the vocabulary satisfying the third preset condition and the corresponding frequency of occurrence in the resource-related URL webpage;

a second sorting submodule, configured to sort the words satisfying the third preset condition according to the frequency;

a second acquiring sub-module, configured to sequentially obtain the vocabulary that satisfies the third preset condition after the sorting, until the acquired frequency corresponding to the vocabulary that satisfies the third preset condition and all the third predetermined conditions are met The frequency corresponding to the vocabulary reaches a predetermined threshold;

The frequency corresponding to the obtained vocabulary that satisfies the third preset condition and the frequency corresponding to all the vocabularies satisfying the third preset condition reach a preset threshold, including: The ratio of the frequency corresponding to the vocabulary of the preset condition to the sum of the frequencies corresponding to the vocabulary satisfying the third preset condition reaches a predetermined threshold;

The first saving submodule is configured to save the acquired vocabulary that satisfies the third preset condition as the vocabulary that satisfies the first preset condition.

Optionally, the collecting device further includes:

An extraction module, configured to extract a resource-related URL from a sample log file before the first processing module performs a related operation;

a crawling module, configured to crawl the webpage content of the resource-related URL, and use the crawled webpage content as the text to be classified;

And a third processing module, configured to perform word segmentation on the to-be-classified text, to obtain the vocabulary that meets the third preset condition.

Optionally, the third processing module includes:

a second processing sub-module, configured to perform word segmentation on the to-be-classified text to obtain a resource-related vocabulary;

a transformation sub-module configured to convert the resource-related vocabulary into a digital vector;

a third processing submodule configured to process the digital vector to obtain a parameter feature vector;

a fourth processing submodule, configured to obtain a sequence parameter according to the parameter feature vector;

a fifth processing submodule, configured to obtain the third preset condition according to the order parameter vocabulary.

Optionally, the fifth processing submodule is configured to:

Optionally, the collecting device further includes:

The collecting module is configured to periodically collect the initial log file before the extracting module performs related operations;

The fourth processing module is configured to obtain the sample log file according to the log data of the initial log file.

Optionally, the fourth processing module includes:

a third obtaining submodule, configured to: when receiving the information request sent by the network client according to the webpage open instruction, obtain, according to the information request, information required to open the corresponding webpage from the initial log file;

The second saving submodule is configured to save the information required to open the corresponding webpage as the sample log file.

The embodiment of the invention further provides a server, comprising: the foregoing collecting device of a cloud environment resource focus point.

The beneficial effects of the above technical solutions of the embodiments of the present invention are as follows:

In the above solution, the method for collecting the focus of the cloud environment resource obtains the vocabulary that satisfies the first preset condition, calculates the importance degree feature vector of the corresponding vocabulary, and obtains the focus of the cloud environment resource; and can calculate and analyze reliably and efficiently. The problem of cloud environment resource attention is mined and extracted, and the problem that the traditional algorithm acquires the focus of the cloud environment resource in the related technology is solved.

Other aspects will be apparent upon reading and understanding the drawings and detailed description.

BRIEF abstract

1 is a schematic flowchart of a method for collecting a focus point of a cloud environment resource according to Embodiment 1 of the present invention;

FIG. 2 is a schematic structural diagram of a device for collecting a cloud environment resource focus point according to Embodiment 2 of the present invention.

Preferred embodiment of the invention

The technical problems, technical solutions, and advantages of the embodiments of the present invention will become more apparent from the following detailed description.

The traditional algorithms for obtaining the focus of the cloud environment resources in the related art are usually operated in a stand-alone mode, which is easily hindered by the performance of many computer hardware such as processor speed and storage capacity, and has a long time-consuming and poor scalability, and as the user logs increase, The complexity of the algorithm is polynomial, and the performance of the algorithm is getting worse.

This paper provides a variety of solutions for the long-term problem of obtaining the attention of cloud environment resources by traditional algorithms in related technologies, including:

Embodiment 1

Referring to FIG. 1, a method for collecting a cloud environment resource focus point in Embodiment 1 of the present invention includes:

Step 11: Collecting a vocabulary that satisfies the first preset condition;

Step 12: Calculate the importance degree feature vector of the vocabulary that satisfies the first preset condition;

Step 13: Obtain a cloud environment resource attention point according to the importance degree feature vector;

Each feature value of the importance degree feature vector is used to represent a weight of a corresponding vocabulary that satisfies the first preset condition in a uniform resource locator URL webpage, and may be a TF-IDF feature vector.

The first preset condition is essentially a condition for obtaining the number of words.

The method for collecting the focus points of the cloud environment resources provided by the first embodiment of the present invention obtains the vocabulary of the importance degree of the corresponding vocabulary by collecting the vocabulary that meets the first preset condition, and then obtains the attention point of the cloud environment resource; thereby achieving reliable and efficient The purpose of calculating, analyzing, mining and extracting the focus of cloud environment resources is to solve the problem that the traditional technology acquires the focus of cloud environment resources in the related technology.

Optionally, the step of obtaining a cloud environment resource focus point according to the importance degree feature vector includes: performing data transformation on the importance degree feature vector to obtain a corresponding frequency; and sequentially arranging the frequency (may be The sequence is selected in descending order; the frequency of satisfying the second preset condition after the arrangement is obtained; and the corresponding cloud environment resource attention point is obtained according to the obtained frequency.

Optionally, the frequency of satisfying the second preset condition after the obtaining the arrangement includes:

The frequency of satisfying the second preset condition after the arrangement is sequentially obtained.

Optionally, the frequency of satisfying the second preset condition after sequentially obtaining the arrangement includes:

The frequency of satisfying the second preset condition after the arrangement is sequentially obtained in descending order.

The second preset condition is essentially the limit condition of the number of cloud environment resource concerns that the administrator needs to obtain.

The step of acquiring the vocabulary that satisfies the first preset condition includes: summarizing the vocabulary that satisfies the third preset condition and the frequency corresponding to each vocabulary in the resource-related URL webpage; and satisfying the satisfaction according to the frequency Sorting the vocabulary of the three preset conditions; sequentially obtaining the sorted vocabulary satisfying the third preset condition, until the frequency corresponding to the obtained vocabulary reaches a frequency corresponding to all the vocabularies satisfying the third preset condition Presetting a threshold (optionally 9/10); saving the obtained vocabulary satisfying the third preset condition as the vocabulary satisfying the first preset condition.

The frequency corresponding to the vocabulary that meets the third preset condition reaches a preset threshold, and the frequency corresponding to the vocabulary that meets the third preset condition is obtained. And a ratio of a sum of frequencies corresponding to all of the words satisfying the third preset condition reaches a predetermined threshold.

Optionally, in an example, the obtained frequency corresponding to the vocabulary that meets the third preset condition refers to a frequency that is currently acquired for a vocabulary that satisfies a third preset condition, where all of the three meet the third The frequency corresponding to the vocabulary of the preset condition refers to the highest frequency among the frequencies corresponding to all the vocabularies satisfying the third preset condition.

Optionally, the sequentially acquiring the sorted words satisfying the third preset condition includes:

The sorted words satisfying the third preset condition are sequentially acquired in descending order.

Optionally, before the collecting the vocabulary that satisfies the first preset condition, the collecting method further includes: extracting a resource-related URL from the sample log file; and crawling the webpage content of the resource-related URL (through a web crawler), The webpage content that is crawled is taken as the text to be classified; the text to be classified is segmented, and the vocabulary that satisfies the third preset condition is obtained.

The third preset condition is essentially a lower limit condition for the number of times the vocabulary appears in the content of the web page.

Wherein, the text to be classified is segmented to obtain the third predetermined condition The step of vocabulary includes: segmenting the text to be classified, obtaining a resource-related vocabulary; converting the resource-related vocabulary into a digital vector; processing the digital vector to obtain a parameter eigenvector; and obtaining the parameter eigenvector according to the parameter a sequence parameter; the vocabulary satisfying the third preset condition is obtained according to the order parameter.

Optionally, the processing the digital vector to obtain the parameter feature vector comprises: performing alignment processing on the digital vector, performing zero mean processing and normalization processing to obtain a parameter feature vector.

Optionally, the step of obtaining the vocabulary that satisfies the third preset condition according to the order parameter comprises: using the order parameter and the ranking algorithm to obtain the vocabulary that satisfies the third preset condition.

Optionally, before the extracting the resource-related Uniform Resource Locator URL from the sample log file, the collecting method further includes: periodically collecting an initial log file (a cloud environment log file to be analyzed); and according to the initial log file The log data is obtained from the sample log file (the file composed of the information required to open the web page).

Optionally, the periodically collecting the initial log file includes: setting a system timer on the node that needs to collect the log, starting a system timer, and setting a system timer task to periodically collect the initial log file.

Optionally, the step of obtaining the sample log file according to the log data of the initial log file includes: when receiving the information request sent by the network client according to the webpage open instruction, requesting from the initial according to the information request The information required to open the corresponding webpage is obtained in the log file; and the information required to open the corresponding webpage is saved as the sample log file.

The vocabulary satisfying the first preset condition corresponds to the best keyword of the high frequency, the frequency of satisfying the second preset condition is the frequency satisfying the threshold set by the user, and the vocabulary satisfying the third preset condition corresponds to the best key The word, the resource-related vocabulary corresponds to the vocabulary of the webpage content of the resource-related URL.

Embodiment 2

Referring to FIG. 2, the apparatus for collecting cloud environment resource focus points in Embodiment 2 of the present invention includes:

The first processing module 21 is configured to collectively obtain a vocabulary that satisfies the first preset condition;

The calculating module 22 is configured to calculate an importance degree feature vector of the vocabulary that satisfies the first preset condition;

The second processing module 23 is configured to obtain a cloud environment resource according to the importance degree feature vector Note

Each feature value of the importance degree feature vector is used to represent a weight of each vocabulary corresponding to the first preset condition in a Uniform Resource Locator URL webpage, optionally TF-IDF Feature vector.

The collection device of the cloud environment resource focus point provided by the second embodiment of the present invention obtains the vocabulary of the importance degree of the corresponding vocabulary by collecting the vocabulary that satisfies the first preset condition, and obtains the cloud environment resource attention point; and can reliably and efficiently Calculating, analyzing, mining and extracting the attention points of cloud environment resources, reducing the time-consuming of the related technologies to obtain the focus points of cloud environment resources by traditional algorithms.

Optionally, the second processing module includes: a transform submodule, configured to perform data transformation on the importance degree feature vector to obtain a corresponding frequency; and the first sorting submodule is configured to sequence the frequency Arrangement (optionally in descending order); the first acquisition sub-module is configured to sequentially obtain the frequency of the second preset condition after the arrangement; the first processing sub-module is set to obtain the corresponding cloud environment according to the acquired frequency Resource concerns.

The first processing module includes: a summary sub-module, configured to summarize the vocabulary satisfying the third preset condition and the frequency corresponding to the vocabulary in the resource-related URL webpage; and the second sorting sub-module is set according to the frequency Sorting the vocabulary that satisfies the third preset condition; the second acquiring sub-module is configured to sequentially obtain the vocabulary that satisfies the third preset condition after the sorting, until the obtained third preset condition is met The frequency corresponding to the vocabulary and the frequency corresponding to all the words satisfying the third preset condition reach a preset threshold (optionally 9/10); the first saving submodule is set to satisfy the third The vocabulary of the preset condition is saved as the vocabulary satisfying the first preset condition.

Optionally, the collecting device further includes: an extracting module, configured to extract a resource-related URL from the sample log file before the first processing module performs the related operation; and the crawling module is set to crawl through the web crawler a webpage content of the resource-related URL, the webpage content that is crawled as the text to be classified; a third processing module, configured to segment the text to be classified, and obtain The vocabulary that satisfies the third preset condition.

The third processing module includes: a second processing sub-module, configured to perform segmentation of the text to be classified to obtain a resource-related vocabulary; and a transformation sub-module configured to convert the resource-related vocabulary into a digital vector; The third processing sub-module is configured to process the digital vector to obtain a parameter feature vector (after the digital vector is aligned, and then perform zero-mean processing and normalization to obtain a parameter feature vector); the fourth processing sub-module, And being configured to obtain a sequence parameter according to the parameter feature vector; and the fifth processing sub-module is configured to obtain the vocabulary that satisfies the third preset condition according to the order parameter.

Optionally, the fifth processing submodule is configured to: obtain the vocabulary that meets the third preset condition by using the order parameter and the ranking algorithm.

Optionally, the collecting device further includes: an collecting module, configured to: before the extracting module performs related operations, (installing a system timer on the node that needs to collect logs, and starting a system timer, setting a system timer task) The initial log file (the cloud environment log file to be analyzed) is periodically collected; the fourth processing module is configured to obtain the sample log file (the file composed of the information required to open the webpage) according to the log data of the initial log file.

Optionally, the fourth processing module includes: a third obtaining submodule, configured to: when receiving the information request sent by the network client according to the webpage open instruction, obtain the open from the initial log file according to the information request Corresponding to the information required by the webpage; the second saving submodule is configured to save the information required to open the corresponding webpage as the sample log file.

The device for collecting the focus of the cloud environment resource provided by the second embodiment of the present invention can improve the server in the related art to implement the function of the device for collecting the focus of the cloud environment resource in the second embodiment.

Embodiment 3

The method for collecting the focus of the cloud environment resource provided by the third embodiment of the present invention includes:

First, the front-end server installs the system timer CRON on each node in the cloud environment that needs to collect logs; adds CRON to the startup script and starts the CRON service; edits the /etc/crontab file to set the tasks that the system periodically performs. In the third embodiment of the present invention, the log file collected periodically by each node is set. It should be noted that the file must have root authority for editing the file.

Then, the periodically collected log files (initial log files) are saved in a unified format, and the log data is pre-processed, wherein the pre-processing includes: when receiving the web page (when the user opens the web page of the cloud platform), the network client sends the spelling When a certain format (such as a string) is requested, the information required to open the corresponding webpage is obtained from the log file (initial log file), and the required information includes any one or more of the following contents:

Start time of operation, end time of operation, client IP, user information, and access address.

The front-end server stores the above information (the obtained required information) as a unified format log file (sample log file), and performs an inter-network transmission to the HDFS (Distributed File System) of the cloud platform, and stores it in the LZO format. In the HDFS file system. Among them, LZO is a data compression algorithm dedicated to decompression speed, LZO is the abbreviation of Lempel-Ziv-Oberhumer.

The back-end server stores the log file (sample log file) in the HDFS, extracts the resource-related URL from the accessed URL, crawls the webpage content corresponding to the resource URL through the web crawler, and retains the webpage content as the text to be classified; The word segmentation technology classifies the content of the resource URL webpage, obtains keywords (resource-related vocabulary); queries the international code library to convert the keywords into digital vectors; after aligning the digital vectors, then performing zero-mean processing and normalization The parameter feature vector is obtained by the processing; the parameter feature vector is identified by the cooperative neural network pattern to obtain the order parameter, and the order parameter is used to obtain the best keyword in the database.

The back-end server summarizes the best keywords, and inputs the summarized best keywords into MapReduce (distributed computing box) to obtain the frequency corresponding to each best keyword. The frequency of occurrence according to each best keyword is from large to small. Arrange, and then select the corresponding vocabulary (high-frequency best keyword) one by one according to the frequency, until the ratio of the word frequency of the selected word to the total word frequency reaches 9 to 10.

Wherein, the ratio of the word frequency to the total word frequency of the selected word reaches 9 to 10 refers to the sum of the frequencies corresponding to all the vocabularies that have acquired the third preset condition, and all the words satisfying the third preset condition. The ratio of the sum of the corresponding frequencies reaches 9 to 10.

Then calculate the TF-IDF (word frequency-reverse file frequency) feature vector corresponding to each selected word (calculate the importance degree of all the high-frequency best keywords for each sample web page for each sample log file, generate each one TF-IDF feature vector for high frequency best keywords):

FeatureVector={f ₁ ,f ₂ ,f ₃ ,f ₄ ,...,f _n }; (1)

In equation (1), the TF-IDF eigenvalue calculation formula for the high-frequency best keyword is:

f _n =tf-idf(t _n ,d,D)=tf(t _n ,d)×idf(t _n ,D); (2)

In equation (2), the formula for calculating the tf value is:

Tf(t _n ,d=NumberofTimes(t _n )); (3)

In equation (2), the formula for calculating the idf value is:

Where, in the formulas (2), (3), (4), D is a collection of all URL web pages, d is a specific URL web page, t _n is the nth high-frequency vocabulary, that is, a feature, and N is a selection The total number of best keywords; FeatureVector is the feature vector, Number of Times is the number of times.

Finally, the TF-IDF feature vector is transformed by MapReduce (distributed parallel computing model in Hadoop framework) to obtain the frequency of each set of feature vectors (multiple TF-IDF eigenvalues of a high-frequency best keyword). And the sum is arranged in descending order, according to the manager's setting, the corresponding amount of cloud environment resource concerns are sequentially obtained in the order of arrangement.

It should be noted that MapReduce is a key technology of cloud computing. It is a software architecture and programming model proposed by Google for parallel computing of large-scale data. MapReduce disassembles all operations of the system into mapping function Map and protocol function Reduce. The Map function splits large-scale data into multiple small data sets and distributes them to multiple machines for parallel operation. The Reduce function aggregates the results of the Map function operations on each machine, and the cooperation between Map and Reduce achieves the effect of distributed parallel computing;

TF represents the number of times this keyword appears in a URL page. IDF is a measure of the universal importance of the keyword. The number of total sample files can be divided by the number of sample files containing the keyword, and the obtained business logarithm. get. Multiply the two parts of TF and IDF to get the importance of a word for a URL page.

The front-end server and the back-end server mentioned in the third embodiment of the present invention may be integrated into one server, or may exist in two servers, which is not limited herein.

The method for collecting cloud environment resource focus points provided by the third embodiment of the present invention filters a large number of URLs, extracts URLs related to cloud resources, and performs TF-IDF on URL webpage content by using MapReduce. The feature extraction not only solves the bottleneck problem of time, storage and calculation of cloud cloud resource attention points in massive log analysis, but also can accurately find the attention of cloud environment resources and improve the utilization of cloud environment resources.

In summary, the solution provided by the embodiment of the present invention can accurately and efficiently calculate, analyze, and mine a large number of user logs, thereby efficiently extracting the cloud environment resource concerns that the user is most concerned about in the log in real time; The algorithm is short in time and extensible. It solves the traditional algorithm running in stand-alone mode, which is easily hindered by many computer hardware performances such as processor speed and storage capacity. As the user log increases, the complexity of the algorithm grows polynomial. The performance of the algorithm is getting worse and worse.

Many of the functional components described in this specification are referred to as modules/sub-modules to more particularly emphasize the independence of their implementation.

In an embodiment of the invention, the modules/sub-modules may be implemented in software for execution by various types of processors. For example, an identified executable code module can comprise one or more physical or logical blocks of computer instructions, which can be constructed, for example, as an object, procedure, or function. Nonetheless, the executable code of the identified modules need not be physically located together, but may include different instructions stored in different bits that, when logically combined, constitute a module and implement the provisions of the module. purpose.

In practice, the executable code module can be a single instruction or a plurality of instructions, and can even be distributed across multiple different code segments, distributed among different programs, and distributed across multiple memory devices. As such, operational data may be identified within the modules and may be implemented in any suitable form and organized within any suitable type of data structure. The operational data may be collected as a single data set, or may be distributed at different locations (including on different storage devices), and may at least partially exist as an electronic signal on a system or network.

When the module can be implemented by software, considering the level of the hardware process in the related art, the module that can be implemented by software can be taken by those skilled in the art without considering the cost. Corresponding functions are implemented to implement corresponding functions, including conventional Very Large Scale Integration (VLSI) circuits or gate arrays and related semiconductors such as logic chips, transistors, or other discrete components. The modules can also be implemented with programmable hardware devices such as field programmable gate arrays, programmable array logic, programmable logic devices, and the like.

The above is an alternative embodiment of the embodiments of the present invention, and it should be noted that those skilled in the art can make some improvements and refinements without departing from the principles of the embodiments of the present invention. And retouching should also be regarded as the scope of protection of the embodiments of the present invention.

One of ordinary skill in the art will appreciate that all or a portion of the steps of the above-described embodiments can be implemented using a computer program flow, which can be stored in a computer readable storage medium, such as on a corresponding hardware platform (eg, The system, device, device, device, etc. are executed, and when executed, include one or a combination of the steps of the method embodiments.

Alternatively, all or part of the steps of the above embodiments may also be implemented by using an integrated circuit. These steps may be separately fabricated into individual integrated circuit modules, or multiple modules or steps may be fabricated into a single integrated circuit module. achieve.

The devices/function modules/functional units in the above embodiments may be implemented by a general-purpose computing device, which may be centralized on a single computing device or distributed over a network of multiple computing devices.

When the device/function module/functional unit in the above embodiment is implemented in the form of a software function module and sold or used as a stand-alone product, it can be stored in a computer readable storage medium. The above mentioned computer readable storage medium may be a read only memory, a magnetic disk or an optical disk or the like.

Industrial applicability

Through the solution of the embodiment of the present invention, the vocabulary that satisfies the first preset condition is summarized, the importance degree feature vector of the corresponding vocabulary is calculated, and the cloud environment resource attention point is obtained, and the cloud environment resource can be calculated, analyzed, mined, and extracted reliably and efficiently. Concerns reduce the problem that it takes a long time to obtain the attention of the cloud environment resources through the algorithm.

Claims

A method for collecting focus points of cloud environment resources, including:

Collecting a vocabulary that satisfies the first preset condition;

Calculating an importance degree feature vector of the vocabulary that satisfies the first preset condition;

Obtaining a focus point of the cloud environment resource according to the importance degree feature vector;

Each feature value of the importance degree feature vector is used to represent a weight of a corresponding vocabulary satisfying the first preset condition in a uniform resource locator URL webpage.
The method of claim 1 , wherein the step of obtaining a cloud environment resource focus according to the importance degree feature vector comprises:

Performing data transformation on the importance degree feature vector to obtain a corresponding frequency;

Arranging the frequencies in order;

Obtaining a frequency that satisfies the second preset condition after the arrangement;

According to the obtained frequency, the corresponding cloud environment resource attention point is obtained.
The collecting method according to claim 1, wherein the step of collectively obtaining a vocabulary that satisfies the first preset condition comprises:

Summarizing the vocabulary that satisfies the third preset condition and its corresponding occurrence frequency in the resource-related URL webpage;

Sorting the vocabulary satisfying the third preset condition from large to small according to the frequency;

And sequentially obtaining the vocabulary satisfying the third preset condition after the sorting, until the sum of the frequencies corresponding to the vocabulary satisfying the third preset condition, and the frequency corresponding to all the vocabularies satisfying the third preset condition The ratio of the sum reaches a predetermined threshold;

The obtained vocabulary that satisfies the third preset condition is saved as the vocabulary that satisfies the first preset condition.
The collection method according to claim 3, wherein the summarizing the vocabulary that satisfies the first preset condition further comprises:

Extract the resource-related URL from the sample log file;

Crawling the webpage content of the resource-related URL, and using the crawled webpage content as the to-be-categorized text this;

The text to be classified is segmented to obtain the vocabulary that satisfies the third preset condition.
The collecting method according to claim 4, wherein the segmenting the text to be classified and obtaining the vocabulary satisfying the third preset condition comprises:

Sorting the text to be classified into a resource-related vocabulary;

Converting the resource-related vocabulary into a digital vector;

Processing the digital vector to obtain a parameter feature vector;

Obtaining a sequence parameter according to the parameter feature vector;

And obtaining the vocabulary that satisfies the third preset condition according to the order parameter.
The acquisition method according to claim 5, wherein the obtaining the vocabulary that satisfies the third preset condition according to the order parameter comprises:

The vocabulary satisfying the third preset condition is obtained by using the order parameter and the ranking algorithm.
The collection method of claim 4, wherein the extracting the resource-related Uniform Resource Locator URL from the sample log file further comprises:

Collect initial log files periodically;

The sample log file is obtained according to log data of the initial log file.
The collecting method according to claim 7, wherein the step of obtaining the sample log file according to the log data of the initial log file comprises:

After receiving the information request sent by the network client according to the webpage open instruction, obtaining, according to the information request, information required to open the corresponding webpage from the initial log file;

The information required to open the corresponding web page is saved as the sample log file.
A collection device for a cloud environment resource focus includes:

a first processing module, configured to collectively obtain a vocabulary that satisfies a first preset condition;

a calculation module, configured to calculate an importance degree feature vector of the vocabulary that satisfies the first preset condition;

a second processing module, configured to obtain a cloud environment resource focus point according to the importance degree feature vector;

Each of the importance degree feature vectors is used to represent a corresponding vocabulary that satisfies the first preset condition in a uniform resource locator URL webpage. weight.
The collection device of claim 9, wherein the second processing module comprises:

a transform submodule, configured to perform data transformation on the importance degree feature vector to obtain a corresponding frequency;

a first sorting submodule, configured to sequentially arrange the frequencies;

The first obtaining submodule is configured to obtain a frequency that satisfies the second preset condition after the arrangement;

The first processing submodule is configured to obtain a corresponding cloud environment resource concern point according to the obtained frequency.
The collection device of claim 9, wherein the first processing module comprises:

a summary sub-module, configured to summarize the vocabulary satisfying the third preset condition and the corresponding frequency of occurrence in the resource-related URL webpage;

a second sorting submodule, configured to sort the vocabulary satisfying the third preset condition from large to small according to the frequency;

a second obtaining sub-module, configured to sequentially obtain the vocabulary that satisfies the third preset condition after the sorting, until the sum of the frequencies corresponding to the vocabulary that meets the third preset condition is obtained, and all the said meet the third The ratio of the sum of the frequencies corresponding to the vocabulary of the preset condition reaches a preset threshold;

The first saving submodule is configured to save the acquired vocabulary that satisfies the third preset condition as the vocabulary that satisfies the first preset condition.
The collection device of claim 11 further comprising:

An extraction module, configured to extract a resource-related URL from a sample log file before the first processing module performs a related operation;

a crawling module, configured to crawl the webpage content of the resource-related URL, and use the crawled webpage content as the text to be classified;

And a third processing module, configured to perform word segmentation on the to-be-classified text, to obtain the vocabulary that meets the third preset condition.
The collection device of claim 12, wherein the third processing module comprises:

a second processing sub-module, configured to perform word segmentation on the to-be-classified text to obtain a resource-related vocabulary;

a transformation sub-module configured to convert the resource-related vocabulary into a digital vector;

a third processing submodule configured to process the digital vector to obtain a parameter feature vector;

a fourth processing submodule, configured to obtain a sequence parameter according to the parameter feature vector;

The fifth processing submodule is configured to obtain the vocabulary that satisfies the third preset condition according to the order parameter.
The collection device according to claim 13, wherein the fifth processing sub-module obtains the vocabulary that satisfies the third preset condition according to the order parameter includes:

The vocabulary satisfying the third preset condition is obtained by using the order parameter and the ranking algorithm.
The collection device of claim 12, further comprising:

The collecting module is configured to periodically collect the initial log file before the extracting module performs related operations;

The fourth processing module is configured to obtain the sample log file according to the log data of the initial log file.
The collection device of claim 15 wherein said fourth processing module comprises:

a third obtaining submodule, configured to: when receiving the information request sent by the network client according to the webpage open instruction, obtain, according to the information request, information required to open the corresponding webpage from the initial log file;

The second saving submodule is configured to save the information required to open the corresponding webpage as the sample log file.
A server, comprising: a cloud environment resource focus point collection device according to any one of claims 9 to 16.