CN107103095A - Method for computing data based on high performance network framework - Google Patents

Method for computing data based on high performance network framework Download PDF

Info

Publication number
CN107103095A
CN107103095A CN201710356926.8A CN201710356926A CN107103095A CN 107103095 A CN107103095 A CN 107103095A CN 201710356926 A CN201710356926 A CN 201710356926A CN 107103095 A CN107103095 A CN 107103095A
Authority
CN
China
Prior art keywords
mapreduce
nearest neighbor
file
shared
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201710356926.8A
Other languages
Chinese (zh)
Inventor
赖真霖
文君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Sixiang Lianchuang Technology Co Ltd
Original Assignee
Chengdu Sixiang Lianchuang Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Sixiang Lianchuang Technology Co Ltd filed Critical Chengdu Sixiang Lianchuang Technology Co Ltd
Priority to CN201710356926.8A priority Critical patent/CN107103095A/en
Publication of CN107103095A publication Critical patent/CN107103095A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a kind of method for computing data based on high performance network framework, this method includes:Small text file is classified using MapReduce;Multiple small documents are merged into by big file based on above-mentioned classification.The present invention proposes a kind of method for computing data based on high performance network framework, and a variety of small documents from different isomerization source are carried out into unified standard tissue based on improved distributed processing framework, is easy to efficient storage, analysis and retrieval.

Description

Method for computing data based on high performance network framework
Technical field
The present invention relates to data calculating, more particularly to a kind of method for computing data based on high performance network framework.
Background technology
Cloud computing technology possesses Distributed Calculation, ultra-large, and virtualization, high reliability, high resiliency is expansible, on demand The features such as service, highly efficient analysis and more preferable computing capability can be provided for big data processing.At big data Hundreds of millions of small documents processing in reason are, it is necessary to which distributed memory system and directory system carry for files such as webpage and mails Supported for storage.With the application demand of a large amount of small text file process, a large amount of isomeric datas are there are in different information systems Source;The unified standardisation body method of data deficiency;In some fields, a large amount of small text files are difficult to effectively analysis and efficiently deposited Storage and retrieval.
The content of the invention
To solve the problems of above-mentioned prior art, the present invention proposes a kind of number based on high performance network framework According to computational methods, including:
Small text file is classified using MapReduce;
Multiple small documents are merged into by big file based on above-mentioned classification.
Preferably, it is described that small text file is classified, further comprise:
K nearest neighbor assorting process is described using MapReduce, is compared while adding characteristic vector in k nearest neighbor, order is again Construct two Feature Words identical characteristic vectors.
Preferably, before the step of utilization MapReduce describes k nearest neighbor assorting process, in addition to:
MapReduce model is improved based on XML and multivalue;Pass through the content of XML tag data, coordinate, operation Information, carries out data processing;Handled by the multivalue during XML tag and Map, realize the operation of data processing.
Preferably, the utilization MapReduce describes k nearest neighbor assorting process, further comprises:Enter first by document format The first subseries of row;For sorted text document, according to the improvement k nearest neighbor point based on MapReduce and characteristic vector reduction Class method is classified;The small text file of unified classification is then combined with, big file is generated;Small text file is suitable according to the time Sequence writes big file, and the name of big file, copy, positional information then are write into namenode, content is write into datanode.
Preferably, the characteristic vector that added in k nearest neighbor compares, and sequentially reconfigures two Feature Words identical features Vector, further comprises:
Identical word and its weight between two original feature vectors are first found out in k nearest neighbor algorithm, according to same characteristic features The order of word reconfigures two Feature Words all identical characteristic vectors, recycles the corresponding weight vectors of Feature Words to calculate this Similarity between two characteristic vectors.
The present invention compared with prior art, with advantages below:
The present invention proposes a kind of method for computing data based on high performance network framework, based on improved distributed treatment A variety of small documents from different isomerization source are carried out unified standard tissue by framework, are easy to efficient storage, analysis and retrieval.
Brief description of the drawings
Fig. 1 is the flow chart of the method for computing data according to embodiments of the present invention based on high performance network framework.
Embodiment
Retouching in detail to one or more embodiment of the invention is hereafter provided together with illustrating the accompanying drawing of the principle of the invention State.The present invention is described with reference to such embodiment, but the invention is not restricted to any embodiment.The scope of the present invention is only by right Claim is limited, and the present invention covers many replacements, modification and equivalent.Illustrate in the following description many details with Thorough understanding of the present invention is just provided.These details are provided for exemplary purposes, and without in these details Some or all details can also realize the present invention according to claims.
An aspect of of the present present invention provides a kind of method for computing data based on high performance network framework.Fig. 1 is according to this The method for computing data flow chart based on high performance network framework of inventive embodiments.
The present invention sets up index structure by document classification, and based on weights similarity Piece file mergence into big file with Handled under cloud computing environment.When classifying to small text file, k nearest neighbor assorting process is described using MapReduce, together When, addition characteristic vector compares in k nearest neighbor, sequentially reconfigures two Feature Words identical characteristic vectors.For document inspection Complex process and content map relation during rope, the improvement MapReduce model based on XML and multivalue.Pass through XML tag The content of data, coordinate, operation etc. information, carry out data complex process.The content of data typically has mapping relations, passes through Multivalue processing during XML tag and Map, realizes the operation of data processing.
First, just subseries is carried out by document format.For sorted text document, according to based on MapReduce and The improvement k nearest neighbor sorting technique of characteristic vector reduction is classified.It is then combined with the small text file of unified classification, the big text of generation Part..Small text file is write into big file sequentially in time, the name of big file, copy, positional information are then write into name Byte point, datanode is write by content.
The comparative approach of traditional characteristic vector is added in k nearest neighbor algorithm, is first found out between two original feature vectors Identical word and its weight, two Feature Words all identical characteristic vectors, then profit are reconfigured according to the order of same characteristic features word The similarity between the two characteristic vectors is calculated with the corresponding weight vectors of Feature Words.
Method is described as follows:All texts in training set are pre-processed, the characteristic vector of key-value pair form is generated;
The characteristic vector T and the set of eigenvectors ET of training sample of the text of step 1. normalization input;And calculate T, Identical Feature Words in ET;
Identical Feature Words and corresponding weights are extracted new vectorial NT, the NET of composition by step 2.;
Step 3. application MapReduce carries out Similarity Measure.Calculate two characteristic vectors weights composition unitary to Similarity sim (t, x) between amount;
Step 4.MapReduce is ranked up to the Similarity Measure result of the text of calculating;
Step 5. takes out similarity k text of highest;The similarity category of this k text is added up;
Step 6. takes similarity maximum SiAnd corresponding classification Ci
If step 7. SiMore than predefined similarity threshold, then identify the text and belong to CiClass.
In terms of index structure, the tree divided with K dimension spaces data set builds the trunk of whole tree structure, judges that K is tieed up Whether tree is empty, if it is empty then directly as root node.Otherwise compare the point and the size of the K values for tieing up root vertex respective dimension is closed System, next step operation is carried out into its left and right subtree;If the point is less than the value of root node respective dimension, enters left subtree and carry out Search operation is until the left subtree or right subtree of some node are sky.Then it regard point insertion as its leaf node;If the point is big In the value of root node respective dimension, then enter right subtree and carry out insertion operation.Then, position is loaded on the leaf node of the K Wei Shu Sensitive hash structure is put, i.e., remaining point is placed into position sensing hash.Data set X is converted into the binary system in space String;Advance Selecting All Parameters r>0, c>1, randomly select k hash function;Data point is stored in using these hash functions corresponding In hash table.
Based on above-mentioned file index rule, small text file is merged, provided with multiple file A1,A2…An, Wherein Ai=ai1,ai2,…,aik..., and aikFor k-th of character of filename.Concretely comprise the following steps:
Step 1, to the character string A of inputi(i=l, 2 ... n), find aik=' ', intercepts aikAll characters below. The number in this block with this class file is counted, m is designated asij.Such text that each piece is included in same node is calculated successively The number of part, obtains sequence mi1, mi2... min, seek mi=∑ mij(j=0,1 ... n) represent the extension included in this node The classification of name.
Step 2. calculates the number M for all small text files deposited in this node, obtains small text file in classification During the weights that set.
Step 3. solves the ratio m shared by each type filei/ M, sorts from big to small in proportion.The extension name of formation List is safeguarded in datanode.
Step 4. counts the root node in the mi on this node, forms root node list.Have in each extension name One root node list.This list is safeguarded in datanode.
Reduce task of the step 5. according to where block to be placed, obtains the extension name of this block.
Step 6. reads the root node of block to be placed.Root node list is set, according to the maximum principle pair of weights similarity Root is ranked up.
Step 7. selects the root made number one in this block.
Step 8. finds the maximum node of extension name proportion in the cluster.This root is searched wherein, if it does, putting Put this block.
Step 9. excludes this node from candidate list, and whether then judge list is empty.It is not sky, goes to step 8.
Step 10. excludes this root from the list of root, and whether the list for judging root is empty.It is not sky, goes to step 7;For Sky, is stored on the node of this extension name at random.
For the complex process during file retrieval and content map relation, in the increase of original MapReduce model Multiple pretreatment load nodes, the task of their execution of these load nodes is to be sent by host node before Map tasks are performed Subtask in the task of hair, then pre-processes user's restriction relation.User asks the processing with restriction relation Host node is submitted to, host node describes the xml document of the restriction relation according to task requests dynamic generation, after task segmentation, read The multivalued mappings relation in xml document is taken, when single map tasks start, the file of input is analyzed and produces many-one relationship Key-value pair, user arbitrarily operated to key-value pair.After the completion of then collect customized key-value pair again, by data processing Restriction relation be disposed, then start again MapReduce scheduling Map processes and Reduce processes.
Further to realize cloud storage load balancing, all storage numbers in cloud storage are represented with Cdata={ 1,2 ... m } According to the set of block.K ∈ Cdata represent kth group data storage, and m is total group of number of data storage in the cloud storage that need to be distributed.Remember cloud In storage platform the storage efficiency of i-th of memory node acquisition group storage resource be L (u (i), i);By cloud storage resource optimization point It is expressed as solving with problemMaximum.
(1) in initialization procedure, the data in CData is hashed into Distribution Strategy according to uniformity, is divided into m group data, deposits Storage node virtual turns to n memory node, initializes the storage efficiency value e and load capacity c of memory node.Stage Counting is set Device i.
(2) according to the memory node number of virtualization, this resource allocation process is divided into n stage.Determine state variable x (i+1) remaining data after 1 to i memory node of distribution, are represented;
(3) x (i) travels through its interval [u (i) with certain step-lengthmin,u(i)max], calculate surplus resources x (i) points The maximum storage efficiency V (x (i), i), while related data is remembered of n-i memory node after i-th of memory node of dispensing Record is in data acquisition system NoteData [i] { x (i), u (i), V (x (i), i) }.
(4) as i=n, data distribution, u (n) are carried out according to the load capacity c and storage efficiency e of n-th of memory node< =cn
Utilization state equation of transfer:X (i+1)=x (i)-u (i)
With Dynamic Programming Equation V (x (i), i)=maxu(i)∈U(x(i))L (u (i), i) ,+V (x (i+1), i+1) }, i=n- 1,2 ..., 1
V (x (n), n)=L (x (n), n)
Release in the optimal value in each stage, assigning process according to the load capacity c of the memory node in each stageiIt is determined that Decision variable u (i) boundary value.
(5) recursive calculation tries to achieve optimizing decision sequence NoteData (u (1), u (2) ..., u (n)), if I.e. data resource is not fully assigned.Recurrence is then repeated, the secondary figure of merit in per stage is taken successively, until
Based on above-mentioned improved MapReduce frameworks, under retrieval concurrent environment more, the present invention sets shared retrieval architecture And shared using two-stage, the first order is shared to realize shared sampling using public sample management mechanism, reduces redundancy I/O expenses;The Two grades of shared calculating by Online aggregate are shared to be abstracted into special ACQ optimization problems.The present invention realizes many from subtask aspect The merging of operation is retrieved, i.e., realizes that task level merges according to the correlation of each retrieval operation subtask, and sharing merging Task is sent to each calculate node and completes further processing.The flow of shared searching system framework based on Hadoop may include: Retrieval collector is responsible for collecting one group of retrieval request, and realizes that task level is closed by the analysis to each retrieval operation Map subtasks And operate, form a series of shared Map tasks;Shared Map tasks are assigned to each calculate node and carry out respective handling, including From HDFS collecting samples data and calculating ASSOCIATE STATISTICS amount;Completed approximately to estimate by Reduce tasks according to statistic relevant information Meter and precision judge, are returned if user's accuracy requirement is met, otherwise repeat aforesaid operations.
Given two retrievals Q1And Q2, its corresponding Map subtasks collection is combined into M1={ M1,1,M1,2…,M1,mAnd M2= {M2,1,M2,2…,M2,n, then secret sharing of the invention is:If two Map subtasks Mi,1∈M1,Mj,2∈M2With identical Input data is data block Bi=Bj, then shared Map tasks are merged into the two Map subtasks and then realize two independent I/O The merging of pipeline, by data block BiUnified access that to complete sampling shared;If two Map subtasks Mi,1∈M1,Mj,2∈M2 In addition to identical block, also predicate and aggregate type sentence, including SUM, COUNT, AVG are retrieved with identical When, merging of the Map tasks realization to two Map task normalized sets is shared, passes through and calculates and the amount completion of multiplexing Intermediate Statistics Normalized set it is shared;If two Map subtasks are B without identical input datai≠BjWhen, then it can not merge generation altogether Enjoy Map tasks.
For above-mentioned different sharing mode and secret sharing, the present invention uses following sharing policy:For every number According to block BiBuilding unified I/O pipelines is used for sample collection, and the random sample of acquisition is stored in the Sample Buffer in internal memory Area, provides data for follow-up shared sampling and supports;It is shared for the first order, participate in what is merged according to each in shared Map tasks The demand of Map tasks sample needed for each round Accuracy extimate, reads correspondingly sized sample set and distribution from buffering area Share the Map tasks of sampling condition to complete calculating task to middle satisfaction;If needing to carry out normalized set in shared Map tasks It is shared, then share result from the first order in the second level is shared and obtain respective sample collection, and to it according to bottom Map task sharings Group carries out the classified calculating of Intermediate Statistics amount, and each shared group obtains respective statistic by the multiplexing to middle statistic, from And complete calculating task.
The classified calculating of the statistic, can specifically be completed by two benches:Division stage and adjusting stage.Input one group of sample This set k={ k1i,k2i,…kni, ascending sort is carried out to sample set k, the stage that divides is determined initially altogether using Greedy strategy Enjoy packet scheme;And the task of adjusting stage is to carry out local directed complete set to the Map tasks in adjacent shared packet.
The division stage uses the variance yields of one group of sample size as the standard for measuring its difference size, by larger to variance Shared packet divide and realize the separation of difference sample size.First, the integral sharing for calculating current shared packet scheme is opened Sell and be designated as cmin, secondly, the shared packet with maximum variance is chosen from shared packet scheme as the candidate of division operation Shared packet, and two new shared packets are divided into according to the average of sample size in shared packet, then, calculate new production Raw shared packet scheme ' integral sharing expense and be designated as ccurIf, cmin≤ccurThen retain the new shared packet scheme to lay equal stress on Multiple above-mentioned division performs flow, the former shared packet scheme of on the contrary then return.
In the adjusting stage, i-th of shared packet sg of the shared packet scheme of definitionrThe packet of moving out of sample size is represented, and The i-th -1 shared packet sgl;The packet of moving into of sample size is represented, the sample size chosen less than sample size average in packet is formed Initial candidate's migration sample duration set cand;Further priority judgement is carried out to the element in cand, preferably sample is chosen This amount is migrated.Each element cand [j] in, counts sg respectivelyrThere is common edge with it in interior remaining sample size The sample size number eg on boundaryrAnd sglThere is the sample size number eg of public boundary in interior all sample sizes with itl.Define two Variable CErAnd CElRespectively to the eg corresponding to cand [j]rAnd eglIt is ranked up, in CErMiddle use ascending order arrangement, and in CElIn Arranged using descending, for any cand [j], using it in CErAnd CElIn index position rInd and lInd be used as priority Normalized parameter, and introduce weight coefficient winAnd woutTo adjust egrAnd eglInfluence to priority.Consider egrAnd egl The sample size migration priority of influence is calculated as follows:
Rank=winrInd+woutlInd
Wherein weight coefficient win+wout=1;Obtain its corresponding migration priority and choose that there is highest for each calculating The sample size of priority carries out the migration between adjacent shared packet to obtain new shared packet scheme, and by sharing cost Calculate and compare and can be determined that whether above-mentioned migration example is effective, until shared cost is no longer reduced, and return to final shared point Prescription case.
Given table search more than one, its Map function is according to different shared demands to corresponding Map tasks or shared point Group is handled respectively, is realized the reading of input data and is carried out normalized set to sample set, each round statistics is calculated into knot Really as the input data of Reduce functions.First, Map functions load global variable to support subsequent statistical amount to calculate, and from The shared Map set of tasks of sampling and the shared packet of normalized set are read in variable.Secondly, for the key assignments of each arrival It is right, cached first by public sample buffer, and it is read out and used according to different shared demands.For adopting Sample is shared, central when saving enough samples, obtains each required sample size and simultaneously updates and then the retrieval class in variable Type pair:Normalized set is carried out, and is key assignments using statistic and current Map task IDs as group currently to retrieve ID by result of calculation Key assignments formation key-value pair is closed as the input data of follow-up Reduce functions.
In summary, the present invention proposes a kind of method for computing data based on high performance network framework, based on improved A variety of small documents from different isomerization source are carried out unified standard tissue by distributed processing framework, are easy to efficient storage, analysis With retrieval.
Obviously, can be with general it should be appreciated by those skilled in the art, above-mentioned each module of the invention or each step Computing system realize that they can be concentrated in single computing system, or be distributed in multiple computing systems and constituted Network on, alternatively, the program code that they can be can perform with computing system be realized, it is thus possible to they are stored Performed within the storage system by computing system.So, the present invention is not restricted to any specific hardware and software combination.
It should be appreciated that the above-mentioned embodiment of the present invention is used only for exemplary illustration or explains the present invention's Principle, without being construed as limiting the invention.Therefore, that is done without departing from the spirit and scope of the present invention is any Modification, equivalent substitution, improvement etc., should be included in the scope of the protection.In addition, appended claims purport of the present invention Covering the whole changes fallen into scope and border or this scope and the equivalents on border and repairing Change example.

Claims (5)

1. a kind of method for computing data based on high performance network framework, for handling small documents under cloud computing environment, it is special Levy and be, including:
Small text file is classified using MapReduce;
Multiple small documents are merged into by big file based on above-mentioned classification.
2. according to the method described in claim 1, it is characterised in that described that small text file is classified, further comprise:
K nearest neighbor assorting process is described using MapReduce, compares while adding characteristic vector in k nearest neighbor, sequentially reconfigures Two Feature Words identical characteristic vectors.
3. method according to claim 2, it is characterised in that the utilization MapReduce describes k nearest neighbor assorting process Before step, in addition to:
MapReduce model is improved based on XML and multivalue;By the content of XML tag data, coordinate, operation information, Carry out data processing;Handled by the multivalue during XML tag and Map, realize the operation of data processing.
4. method according to claim 2, it is characterised in that the utilization MapReduce describes k nearest neighbor assorting process, Further comprise:First just subseries is carried out by document format;For sorted text document, according to based on MapReduce Classified with the improvement k nearest neighbor sorting technique of characteristic vector reduction;The small text file of unified classification is then combined with, generation is big File;Small text file is write into big file sequentially in time, then the name of big file, copy, positional information is write Namenode, datanode is write by content.
5. method according to claim 2, it is characterised in that the characteristic vector that added in k nearest neighbor compares, order weight Two Feature Words identical characteristic vectors of neotectonics, further comprise:
Identical word and its weight between two original feature vectors are first found out in k nearest neighbor algorithm, according to same characteristic features word Order reconfigures two Feature Words all identical characteristic vectors, recycles the corresponding weight vectors of Feature Words to calculate the two Similarity between characteristic vector.
CN201710356926.8A 2017-05-19 2017-05-19 Method for computing data based on high performance network framework Withdrawn CN107103095A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710356926.8A CN107103095A (en) 2017-05-19 2017-05-19 Method for computing data based on high performance network framework

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710356926.8A CN107103095A (en) 2017-05-19 2017-05-19 Method for computing data based on high performance network framework

Publications (1)

Publication Number Publication Date
CN107103095A true CN107103095A (en) 2017-08-29

Family

ID=59669377

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710356926.8A Withdrawn CN107103095A (en) 2017-05-19 2017-05-19 Method for computing data based on high performance network framework

Country Status (1)

Country Link
CN (1) CN107103095A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109634914A (en) * 2018-11-21 2019-04-16 华侨大学 A kind of scattered point of optimization method retrieved with bifurcated of radio voice small documents whole deposit
CN110019830A (en) * 2017-09-20 2019-07-16 腾讯科技(深圳)有限公司 Corpus processing, term vector acquisition methods and device, storage medium and equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104679898A (en) * 2015-03-18 2015-06-03 成都汇智远景科技有限公司 Big data access method
CN104731921A (en) * 2015-03-26 2015-06-24 江苏物联网研究发展中心 Method for storing and processing small log type files in Hadoop distributed file system
CN104820717A (en) * 2015-05-22 2015-08-05 国网智能电网研究院 Massive small file storage and management method and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104679898A (en) * 2015-03-18 2015-06-03 成都汇智远景科技有限公司 Big data access method
CN104731921A (en) * 2015-03-26 2015-06-24 江苏物联网研究发展中心 Method for storing and processing small log type files in Hadoop distributed file system
CN104820717A (en) * 2015-05-22 2015-08-05 国网智能电网研究院 Massive small file storage and management method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
任崇广: "面向海量数据处理领域的云计算及其关键技术研究", 《中国优秀博士学位论文全文数据库》 *
罗军舟: "云计算环境下面向大数据的在线聚集优化机制研究", 《中国优秀博士学位论文全文数据库》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110019830A (en) * 2017-09-20 2019-07-16 腾讯科技(深圳)有限公司 Corpus processing, term vector acquisition methods and device, storage medium and equipment
CN110019830B (en) * 2017-09-20 2022-09-23 腾讯科技(深圳)有限公司 Corpus processing method, corpus processing device, word vector obtaining method, word vector obtaining device, storage medium and equipment
CN109634914A (en) * 2018-11-21 2019-04-16 华侨大学 A kind of scattered point of optimization method retrieved with bifurcated of radio voice small documents whole deposit
CN109634914B (en) * 2018-11-21 2021-11-30 华侨大学 Optimization method for whole storage, dispersion and bifurcation retrieval of talkback voice small files

Similar Documents

Publication Publication Date Title
CN107292186B (en) Model training method and device based on random forest
CN113064879B (en) Database parameter adjusting method and device and computer readable storage medium
US20180004751A1 (en) Methods and apparatus for subgraph matching in big data analysis
CN107066328A (en) The construction method of large-scale data processing platform
CN107193940A (en) Big data method for optimization analysis
CN107908536B (en) Performance evaluation method and system for GPU application in CPU-GPU heterogeneous environment
CN107832456B (en) Parallel KNN text classification method based on critical value data division
Zhang et al. Virtual machine placement strategy using cluster-based genetic algorithm
CN111143685B (en) Commodity recommendation method and device
CN111259933B (en) High-dimensional characteristic data classification method and system based on distributed parallel decision tree
US20190236474A1 (en) Load balancing for distributed processing of deterministically assigned data using statistical analysis of block data
CN106934410A (en) The sorting technique and system of data
Li et al. Research on QoS service composition based on coevolutionary genetic algorithm
CN107291539A (en) Cluster program scheduler method based on resource significance level
US7890705B2 (en) Shared-memory multiprocessor system and information processing method
CN107103095A (en) Method for computing data based on high performance network framework
US8667008B2 (en) Search request control apparatus and search request control method
CN108389152B (en) Graph processing method and device for graph structure perception
CN116680090B (en) Edge computing network management method and platform based on big data
US7647592B2 (en) Methods and systems for assigning objects to processing units
CN115510331B (en) Shared resource matching method based on idle amount aggregation
CN108932258A (en) Data directory processing method and processing device
CN116243869A (en) Data processing method and device and electronic equipment
CN108280176A (en) Data mining optimization method based on MapReduce
CN108256083A (en) Content recommendation method based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20170829