CN107103095A - Method for computing data based on high performance network framework - Google Patents
Method for computing data based on high performance network framework Download PDFInfo
- Publication number
- CN107103095A CN107103095A CN201710356926.8A CN201710356926A CN107103095A CN 107103095 A CN107103095 A CN 107103095A CN 201710356926 A CN201710356926 A CN 201710356926A CN 107103095 A CN107103095 A CN 107103095A
- Authority
- CN
- China
- Prior art keywords
- mapreduce
- nearest neighbor
- file
- shared
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Class or cluster creation or modification
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a kind of method for computing data based on high performance network framework, this method includes:Small text file is classified using MapReduce;Multiple small documents are merged into by big file based on above-mentioned classification.The present invention proposes a kind of method for computing data based on high performance network framework, and a variety of small documents from different isomerization source are carried out into unified standard tissue based on improved distributed processing framework, is easy to efficient storage, analysis and retrieval.
Description
Technical field
The present invention relates to data calculating, more particularly to a kind of method for computing data based on high performance network framework.
Background technology
Cloud computing technology possesses Distributed Calculation, ultra-large, and virtualization, high reliability, high resiliency is expansible, on demand
The features such as service, highly efficient analysis and more preferable computing capability can be provided for big data processing.At big data
Hundreds of millions of small documents processing in reason are, it is necessary to which distributed memory system and directory system carry for files such as webpage and mails
Supported for storage.With the application demand of a large amount of small text file process, a large amount of isomeric datas are there are in different information systems
Source;The unified standardisation body method of data deficiency;In some fields, a large amount of small text files are difficult to effectively analysis and efficiently deposited
Storage and retrieval.
The content of the invention
To solve the problems of above-mentioned prior art, the present invention proposes a kind of number based on high performance network framework
According to computational methods, including:
Small text file is classified using MapReduce;
Multiple small documents are merged into by big file based on above-mentioned classification.
Preferably, it is described that small text file is classified, further comprise:
K nearest neighbor assorting process is described using MapReduce, is compared while adding characteristic vector in k nearest neighbor, order is again
Construct two Feature Words identical characteristic vectors.
Preferably, before the step of utilization MapReduce describes k nearest neighbor assorting process, in addition to:
MapReduce model is improved based on XML and multivalue;Pass through the content of XML tag data, coordinate, operation
Information, carries out data processing;Handled by the multivalue during XML tag and Map, realize the operation of data processing.
Preferably, the utilization MapReduce describes k nearest neighbor assorting process, further comprises:Enter first by document format
The first subseries of row;For sorted text document, according to the improvement k nearest neighbor point based on MapReduce and characteristic vector reduction
Class method is classified;The small text file of unified classification is then combined with, big file is generated;Small text file is suitable according to the time
Sequence writes big file, and the name of big file, copy, positional information then are write into namenode, content is write into datanode.
Preferably, the characteristic vector that added in k nearest neighbor compares, and sequentially reconfigures two Feature Words identical features
Vector, further comprises:
Identical word and its weight between two original feature vectors are first found out in k nearest neighbor algorithm, according to same characteristic features
The order of word reconfigures two Feature Words all identical characteristic vectors, recycles the corresponding weight vectors of Feature Words to calculate this
Similarity between two characteristic vectors.
The present invention compared with prior art, with advantages below:
The present invention proposes a kind of method for computing data based on high performance network framework, based on improved distributed treatment
A variety of small documents from different isomerization source are carried out unified standard tissue by framework, are easy to efficient storage, analysis and retrieval.
Brief description of the drawings
Fig. 1 is the flow chart of the method for computing data according to embodiments of the present invention based on high performance network framework.
Embodiment
Retouching in detail to one or more embodiment of the invention is hereafter provided together with illustrating the accompanying drawing of the principle of the invention
State.The present invention is described with reference to such embodiment, but the invention is not restricted to any embodiment.The scope of the present invention is only by right
Claim is limited, and the present invention covers many replacements, modification and equivalent.Illustrate in the following description many details with
Thorough understanding of the present invention is just provided.These details are provided for exemplary purposes, and without in these details
Some or all details can also realize the present invention according to claims.
An aspect of of the present present invention provides a kind of method for computing data based on high performance network framework.Fig. 1 is according to this
The method for computing data flow chart based on high performance network framework of inventive embodiments.
The present invention sets up index structure by document classification, and based on weights similarity Piece file mergence into big file with
Handled under cloud computing environment.When classifying to small text file, k nearest neighbor assorting process is described using MapReduce, together
When, addition characteristic vector compares in k nearest neighbor, sequentially reconfigures two Feature Words identical characteristic vectors.For document inspection
Complex process and content map relation during rope, the improvement MapReduce model based on XML and multivalue.Pass through XML tag
The content of data, coordinate, operation etc. information, carry out data complex process.The content of data typically has mapping relations, passes through
Multivalue processing during XML tag and Map, realizes the operation of data processing.
First, just subseries is carried out by document format.For sorted text document, according to based on MapReduce and
The improvement k nearest neighbor sorting technique of characteristic vector reduction is classified.It is then combined with the small text file of unified classification, the big text of generation
Part..Small text file is write into big file sequentially in time, the name of big file, copy, positional information are then write into name
Byte point, datanode is write by content.
The comparative approach of traditional characteristic vector is added in k nearest neighbor algorithm, is first found out between two original feature vectors
Identical word and its weight, two Feature Words all identical characteristic vectors, then profit are reconfigured according to the order of same characteristic features word
The similarity between the two characteristic vectors is calculated with the corresponding weight vectors of Feature Words.
Method is described as follows:All texts in training set are pre-processed, the characteristic vector of key-value pair form is generated;
The characteristic vector T and the set of eigenvectors ET of training sample of the text of step 1. normalization input;And calculate T,
Identical Feature Words in ET;
Identical Feature Words and corresponding weights are extracted new vectorial NT, the NET of composition by step 2.;
Step 3. application MapReduce carries out Similarity Measure.Calculate two characteristic vectors weights composition unitary to
Similarity sim (t, x) between amount;
Step 4.MapReduce is ranked up to the Similarity Measure result of the text of calculating;
Step 5. takes out similarity k text of highest;The similarity category of this k text is added up;
Step 6. takes similarity maximum SiAnd corresponding classification Ci;
If step 7. SiMore than predefined similarity threshold, then identify the text and belong to CiClass.
In terms of index structure, the tree divided with K dimension spaces data set builds the trunk of whole tree structure, judges that K is tieed up
Whether tree is empty, if it is empty then directly as root node.Otherwise compare the point and the size of the K values for tieing up root vertex respective dimension is closed
System, next step operation is carried out into its left and right subtree;If the point is less than the value of root node respective dimension, enters left subtree and carry out
Search operation is until the left subtree or right subtree of some node are sky.Then it regard point insertion as its leaf node;If the point is big
In the value of root node respective dimension, then enter right subtree and carry out insertion operation.Then, position is loaded on the leaf node of the K Wei Shu
Sensitive hash structure is put, i.e., remaining point is placed into position sensing hash.Data set X is converted into the binary system in space
String;Advance Selecting All Parameters r>0, c>1, randomly select k hash function;Data point is stored in using these hash functions corresponding
In hash table.
Based on above-mentioned file index rule, small text file is merged, provided with multiple file A1,A2…An,
Wherein Ai=ai1,ai2,…,aik..., and aikFor k-th of character of filename.Concretely comprise the following steps:
Step 1, to the character string A of inputi(i=l, 2 ... n), find aik=' ', intercepts aikAll characters below.
The number in this block with this class file is counted, m is designated asij.Such text that each piece is included in same node is calculated successively
The number of part, obtains sequence mi1, mi2... min, seek mi=∑ mij(j=0,1 ... n) represent the extension included in this node
The classification of name.
Step 2. calculates the number M for all small text files deposited in this node, obtains small text file in classification
During the weights that set.
Step 3. solves the ratio m shared by each type filei/ M, sorts from big to small in proportion.The extension name of formation
List is safeguarded in datanode.
Step 4. counts the root node in the mi on this node, forms root node list.Have in each extension name
One root node list.This list is safeguarded in datanode.
Reduce task of the step 5. according to where block to be placed, obtains the extension name of this block.
Step 6. reads the root node of block to be placed.Root node list is set, according to the maximum principle pair of weights similarity
Root is ranked up.
Step 7. selects the root made number one in this block.
Step 8. finds the maximum node of extension name proportion in the cluster.This root is searched wherein, if it does, putting
Put this block.
Step 9. excludes this node from candidate list, and whether then judge list is empty.It is not sky, goes to step 8.
Step 10. excludes this root from the list of root, and whether the list for judging root is empty.It is not sky, goes to step 7;For
Sky, is stored on the node of this extension name at random.
For the complex process during file retrieval and content map relation, in the increase of original MapReduce model
Multiple pretreatment load nodes, the task of their execution of these load nodes is to be sent by host node before Map tasks are performed
Subtask in the task of hair, then pre-processes user's restriction relation.User asks the processing with restriction relation
Host node is submitted to, host node describes the xml document of the restriction relation according to task requests dynamic generation, after task segmentation, read
The multivalued mappings relation in xml document is taken, when single map tasks start, the file of input is analyzed and produces many-one relationship
Key-value pair, user arbitrarily operated to key-value pair.After the completion of then collect customized key-value pair again, by data processing
Restriction relation be disposed, then start again MapReduce scheduling Map processes and Reduce processes.
Further to realize cloud storage load balancing, all storage numbers in cloud storage are represented with Cdata={ 1,2 ... m }
According to the set of block.K ∈ Cdata represent kth group data storage, and m is total group of number of data storage in the cloud storage that need to be distributed.Remember cloud
In storage platform the storage efficiency of i-th of memory node acquisition group storage resource be L (u (i), i);By cloud storage resource optimization point
It is expressed as solving with problemMaximum.
(1) in initialization procedure, the data in CData is hashed into Distribution Strategy according to uniformity, is divided into m group data, deposits
Storage node virtual turns to n memory node, initializes the storage efficiency value e and load capacity c of memory node.Stage Counting is set
Device i.
(2) according to the memory node number of virtualization, this resource allocation process is divided into n stage.Determine state variable x
(i+1) remaining data after 1 to i memory node of distribution, are represented;
(3) x (i) travels through its interval [u (i) with certain step-lengthmin,u(i)max], calculate surplus resources x (i) points
The maximum storage efficiency V (x (i), i), while related data is remembered of n-i memory node after i-th of memory node of dispensing
Record is in data acquisition system NoteData [i] { x (i), u (i), V (x (i), i) }.
(4) as i=n, data distribution, u (n) are carried out according to the load capacity c and storage efficiency e of n-th of memory node<
=cn。
Utilization state equation of transfer:X (i+1)=x (i)-u (i)
With Dynamic Programming Equation V (x (i), i)=maxu(i)∈U(x(i))L (u (i), i) ,+V (x (i+1), i+1) }, i=n-
1,2 ..., 1
V (x (n), n)=L (x (n), n)
Release in the optimal value in each stage, assigning process according to the load capacity c of the memory node in each stageiIt is determined that
Decision variable u (i) boundary value.
(5) recursive calculation tries to achieve optimizing decision sequence NoteData (u (1), u (2) ..., u (n)), if
I.e. data resource is not fully assigned.Recurrence is then repeated, the secondary figure of merit in per stage is taken successively, until
Based on above-mentioned improved MapReduce frameworks, under retrieval concurrent environment more, the present invention sets shared retrieval architecture
And shared using two-stage, the first order is shared to realize shared sampling using public sample management mechanism, reduces redundancy I/O expenses;The
Two grades of shared calculating by Online aggregate are shared to be abstracted into special ACQ optimization problems.The present invention realizes many from subtask aspect
The merging of operation is retrieved, i.e., realizes that task level merges according to the correlation of each retrieval operation subtask, and sharing merging
Task is sent to each calculate node and completes further processing.The flow of shared searching system framework based on Hadoop may include:
Retrieval collector is responsible for collecting one group of retrieval request, and realizes that task level is closed by the analysis to each retrieval operation Map subtasks
And operate, form a series of shared Map tasks;Shared Map tasks are assigned to each calculate node and carry out respective handling, including
From HDFS collecting samples data and calculating ASSOCIATE STATISTICS amount;Completed approximately to estimate by Reduce tasks according to statistic relevant information
Meter and precision judge, are returned if user's accuracy requirement is met, otherwise repeat aforesaid operations.
Given two retrievals Q1And Q2, its corresponding Map subtasks collection is combined into M1={ M1,1,M1,2…,M1,mAnd M2=
{M2,1,M2,2…,M2,n, then secret sharing of the invention is:If two Map subtasks Mi,1∈M1,Mj,2∈M2With identical
Input data is data block Bi=Bj, then shared Map tasks are merged into the two Map subtasks and then realize two independent I/O
The merging of pipeline, by data block BiUnified access that to complete sampling shared;If two Map subtasks Mi,1∈M1,Mj,2∈M2
In addition to identical block, also predicate and aggregate type sentence, including SUM, COUNT, AVG are retrieved with identical
When, merging of the Map tasks realization to two Map task normalized sets is shared, passes through and calculates and the amount completion of multiplexing Intermediate Statistics
Normalized set it is shared;If two Map subtasks are B without identical input datai≠BjWhen, then it can not merge generation altogether
Enjoy Map tasks.
For above-mentioned different sharing mode and secret sharing, the present invention uses following sharing policy:For every number
According to block BiBuilding unified I/O pipelines is used for sample collection, and the random sample of acquisition is stored in the Sample Buffer in internal memory
Area, provides data for follow-up shared sampling and supports;It is shared for the first order, participate in what is merged according to each in shared Map tasks
The demand of Map tasks sample needed for each round Accuracy extimate, reads correspondingly sized sample set and distribution from buffering area
Share the Map tasks of sampling condition to complete calculating task to middle satisfaction;If needing to carry out normalized set in shared Map tasks
It is shared, then share result from the first order in the second level is shared and obtain respective sample collection, and to it according to bottom Map task sharings
Group carries out the classified calculating of Intermediate Statistics amount, and each shared group obtains respective statistic by the multiplexing to middle statistic, from
And complete calculating task.
The classified calculating of the statistic, can specifically be completed by two benches:Division stage and adjusting stage.Input one group of sample
This set k={ k1i,k2i,…kni, ascending sort is carried out to sample set k, the stage that divides is determined initially altogether using Greedy strategy
Enjoy packet scheme;And the task of adjusting stage is to carry out local directed complete set to the Map tasks in adjacent shared packet.
The division stage uses the variance yields of one group of sample size as the standard for measuring its difference size, by larger to variance
Shared packet divide and realize the separation of difference sample size.First, the integral sharing for calculating current shared packet scheme is opened
Sell and be designated as cmin, secondly, the shared packet with maximum variance is chosen from shared packet scheme as the candidate of division operation
Shared packet, and two new shared packets are divided into according to the average of sample size in shared packet, then, calculate new production
Raw shared packet scheme ' integral sharing expense and be designated as ccurIf, cmin≤ccurThen retain the new shared packet scheme to lay equal stress on
Multiple above-mentioned division performs flow, the former shared packet scheme of on the contrary then return.
In the adjusting stage, i-th of shared packet sg of the shared packet scheme of definitionrThe packet of moving out of sample size is represented, and
The i-th -1 shared packet sgl;The packet of moving into of sample size is represented, the sample size chosen less than sample size average in packet is formed
Initial candidate's migration sample duration set cand;Further priority judgement is carried out to the element in cand, preferably sample is chosen
This amount is migrated.Each element cand [j] in, counts sg respectivelyrThere is common edge with it in interior remaining sample size
The sample size number eg on boundaryrAnd sglThere is the sample size number eg of public boundary in interior all sample sizes with itl.Define two
Variable CErAnd CElRespectively to the eg corresponding to cand [j]rAnd eglIt is ranked up, in CErMiddle use ascending order arrangement, and in CElIn
Arranged using descending, for any cand [j], using it in CErAnd CElIn index position rInd and lInd be used as priority
Normalized parameter, and introduce weight coefficient winAnd woutTo adjust egrAnd eglInfluence to priority.Consider egrAnd egl
The sample size migration priority of influence is calculated as follows:
Rank=winrInd+woutlInd
Wherein weight coefficient win+wout=1;Obtain its corresponding migration priority and choose that there is highest for each calculating
The sample size of priority carries out the migration between adjacent shared packet to obtain new shared packet scheme, and by sharing cost
Calculate and compare and can be determined that whether above-mentioned migration example is effective, until shared cost is no longer reduced, and return to final shared point
Prescription case.
Given table search more than one, its Map function is according to different shared demands to corresponding Map tasks or shared point
Group is handled respectively, is realized the reading of input data and is carried out normalized set to sample set, each round statistics is calculated into knot
Really as the input data of Reduce functions.First, Map functions load global variable to support subsequent statistical amount to calculate, and from
The shared Map set of tasks of sampling and the shared packet of normalized set are read in variable.Secondly, for the key assignments of each arrival
It is right, cached first by public sample buffer, and it is read out and used according to different shared demands.For adopting
Sample is shared, central when saving enough samples, obtains each required sample size and simultaneously updates and then the retrieval class in variable
Type pair:Normalized set is carried out, and is key assignments using statistic and current Map task IDs as group currently to retrieve ID by result of calculation
Key assignments formation key-value pair is closed as the input data of follow-up Reduce functions.
In summary, the present invention proposes a kind of method for computing data based on high performance network framework, based on improved
A variety of small documents from different isomerization source are carried out unified standard tissue by distributed processing framework, are easy to efficient storage, analysis
With retrieval.
Obviously, can be with general it should be appreciated by those skilled in the art, above-mentioned each module of the invention or each step
Computing system realize that they can be concentrated in single computing system, or be distributed in multiple computing systems and constituted
Network on, alternatively, the program code that they can be can perform with computing system be realized, it is thus possible to they are stored
Performed within the storage system by computing system.So, the present invention is not restricted to any specific hardware and software combination.
It should be appreciated that the above-mentioned embodiment of the present invention is used only for exemplary illustration or explains the present invention's
Principle, without being construed as limiting the invention.Therefore, that is done without departing from the spirit and scope of the present invention is any
Modification, equivalent substitution, improvement etc., should be included in the scope of the protection.In addition, appended claims purport of the present invention
Covering the whole changes fallen into scope and border or this scope and the equivalents on border and repairing
Change example.
Claims (5)
1. a kind of method for computing data based on high performance network framework, for handling small documents under cloud computing environment, it is special
Levy and be, including:
Small text file is classified using MapReduce;
Multiple small documents are merged into by big file based on above-mentioned classification.
2. according to the method described in claim 1, it is characterised in that described that small text file is classified, further comprise:
K nearest neighbor assorting process is described using MapReduce, compares while adding characteristic vector in k nearest neighbor, sequentially reconfigures
Two Feature Words identical characteristic vectors.
3. method according to claim 2, it is characterised in that the utilization MapReduce describes k nearest neighbor assorting process
Before step, in addition to:
MapReduce model is improved based on XML and multivalue;By the content of XML tag data, coordinate, operation information,
Carry out data processing;Handled by the multivalue during XML tag and Map, realize the operation of data processing.
4. method according to claim 2, it is characterised in that the utilization MapReduce describes k nearest neighbor assorting process,
Further comprise:First just subseries is carried out by document format;For sorted text document, according to based on MapReduce
Classified with the improvement k nearest neighbor sorting technique of characteristic vector reduction;The small text file of unified classification is then combined with, generation is big
File;Small text file is write into big file sequentially in time, then the name of big file, copy, positional information is write
Namenode, datanode is write by content.
5. method according to claim 2, it is characterised in that the characteristic vector that added in k nearest neighbor compares, order weight
Two Feature Words identical characteristic vectors of neotectonics, further comprise:
Identical word and its weight between two original feature vectors are first found out in k nearest neighbor algorithm, according to same characteristic features word
Order reconfigures two Feature Words all identical characteristic vectors, recycles the corresponding weight vectors of Feature Words to calculate the two
Similarity between characteristic vector.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710356926.8A CN107103095A (en) | 2017-05-19 | 2017-05-19 | Method for computing data based on high performance network framework |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710356926.8A CN107103095A (en) | 2017-05-19 | 2017-05-19 | Method for computing data based on high performance network framework |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107103095A true CN107103095A (en) | 2017-08-29 |
Family
ID=59669377
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710356926.8A Withdrawn CN107103095A (en) | 2017-05-19 | 2017-05-19 | Method for computing data based on high performance network framework |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107103095A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109634914A (en) * | 2018-11-21 | 2019-04-16 | 华侨大学 | A kind of scattered point of optimization method retrieved with bifurcated of radio voice small documents whole deposit |
CN110019830A (en) * | 2017-09-20 | 2019-07-16 | 腾讯科技(深圳)有限公司 | Corpus processing, term vector acquisition methods and device, storage medium and equipment |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104679898A (en) * | 2015-03-18 | 2015-06-03 | 成都汇智远景科技有限公司 | Big data access method |
CN104731921A (en) * | 2015-03-26 | 2015-06-24 | 江苏物联网研究发展中心 | Method for storing and processing small log type files in Hadoop distributed file system |
CN104820717A (en) * | 2015-05-22 | 2015-08-05 | 国网智能电网研究院 | Massive small file storage and management method and system |
-
2017
- 2017-05-19 CN CN201710356926.8A patent/CN107103095A/en not_active Withdrawn
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104679898A (en) * | 2015-03-18 | 2015-06-03 | 成都汇智远景科技有限公司 | Big data access method |
CN104731921A (en) * | 2015-03-26 | 2015-06-24 | 江苏物联网研究发展中心 | Method for storing and processing small log type files in Hadoop distributed file system |
CN104820717A (en) * | 2015-05-22 | 2015-08-05 | 国网智能电网研究院 | Massive small file storage and management method and system |
Non-Patent Citations (2)
Title |
---|
任崇广: "面向海量数据处理领域的云计算及其关键技术研究", 《中国优秀博士学位论文全文数据库》 * |
罗军舟: "云计算环境下面向大数据的在线聚集优化机制研究", 《中国优秀博士学位论文全文数据库》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110019830A (en) * | 2017-09-20 | 2019-07-16 | 腾讯科技(深圳)有限公司 | Corpus processing, term vector acquisition methods and device, storage medium and equipment |
CN110019830B (en) * | 2017-09-20 | 2022-09-23 | 腾讯科技(深圳)有限公司 | Corpus processing method, corpus processing device, word vector obtaining method, word vector obtaining device, storage medium and equipment |
CN109634914A (en) * | 2018-11-21 | 2019-04-16 | 华侨大学 | A kind of scattered point of optimization method retrieved with bifurcated of radio voice small documents whole deposit |
CN109634914B (en) * | 2018-11-21 | 2021-11-30 | 华侨大学 | Optimization method for whole storage, dispersion and bifurcation retrieval of talkback voice small files |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Liu et al. | A speculative approach to spatial‐temporal efficiency with multi‐objective optimization in a heterogeneous cloud environment | |
CN113064879B (en) | Database parameter adjusting method and device and computer readable storage medium | |
CN107292186B (en) | Model training method and device based on random forest | |
US11423082B2 (en) | Methods and apparatus for subgraph matching in big data analysis | |
CN111143685B (en) | Commodity recommendation method and device | |
CN107066328A (en) | The construction method of large-scale data processing platform | |
CN107193940A (en) | Big data method for optimization analysis | |
CN103345514A (en) | Streamed data processing method in big data environment | |
CN107832456B (en) | Parallel KNN text classification method based on critical value data division | |
CN107291539B (en) | Cluster program scheduler method based on resource significance level | |
US11074516B2 (en) | Load balancing for distributed processing of deterministically assigned data using statistical analysis of block data | |
CN111259933B (en) | High-dimensional characteristic data classification method and system based on distributed parallel decision tree | |
CN110069502A (en) | Data balancing partition method and computer storage medium based on Spark framework | |
CN106934410A (en) | The sorting technique and system of data | |
Li et al. | Research on QoS service composition based on coevolutionary genetic algorithm | |
US7890705B2 (en) | Shared-memory multiprocessor system and information processing method | |
CN108389152B (en) | Graph processing method and device for graph structure perception | |
CN107103095A (en) | Method for computing data based on high performance network framework | |
US8667008B2 (en) | Search request control apparatus and search request control method | |
US7647592B2 (en) | Methods and systems for assigning objects to processing units | |
CN108256083A (en) | Content recommendation method based on deep learning | |
CN108052535A (en) | The parallel fast matching method of visual signature and system based on multi processor platform | |
CN116680090A (en) | Edge computing network management method and platform based on big data | |
CN115510331B (en) | Shared resource matching method based on idle amount aggregation | |
CN116243869A (en) | Data processing method and device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20170829 |