CN108334888A - For the compressed encoding of bit sequence - Google Patents

For the compressed encoding of bit sequence Download PDF

Info

Publication number
CN108334888A
CN108334888A CN201710044988.5A CN201710044988A CN108334888A CN 108334888 A CN108334888 A CN 108334888A CN 201710044988 A CN201710044988 A CN 201710044988A CN 108334888 A CN108334888 A CN 108334888A
Authority
CN
China
Prior art keywords
bit sequence
value
bit
node
byte
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710044988.5A
Other languages
Chinese (zh)
Other versions
CN108334888B (en
Inventor
周虎成
叶挺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Technology Licensing LLC
Original Assignee
Microsoft Technology Licensing LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Technology Licensing LLC filed Critical Microsoft Technology Licensing LLC
Priority to CN201710044988.5A priority Critical patent/CN108334888B/en
Priority to PCT/US2018/013750 priority patent/WO2018136371A1/en
Publication of CN108334888A publication Critical patent/CN108334888A/en
Application granted granted Critical
Publication of CN108334888B publication Critical patent/CN108334888B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/46Conversion to or from run-length codes, i.e. by representing the number of consecutive digits, or groups of digits, of the same kind by a code word and a digit indicative of that kind
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

In embodiment of the subject matter described herein, it is proposed that a kind of method and apparatus of compressed encoding for bit sequence.After obtaining bit sequence, the initial position and end position that specific bit value occurs in bit sequence are determined, and be then based on initial position and end position comes to bit sequential coding.It can realize and bit sequence is effectively compressed, thus reduce the length of bit sequence by determining initial position and the end position of the specific bit value in bit sequence according to embodiment of the subject matter described herein.Therefore, embodiment of the subject matter described herein can not only reduce computation complexity, and can promote data processing performance by parallel processing.

Description

For the compressed encoding of bit sequence
Background technology
Bit is the minimum data storage cell in computer, indicates that bit and value are 0 or 1.Word Section is the base unit of Computer Data storage, and information is stored and explain usually as unit of byte in computer.One word It saves (B) and is equal to 8 bits (b), and can usually store an English character or half of Chinese character.Due in calculating Storage, transmission and processing information content it is very big, therefore generally also use kilobytes (KB), Mbytes (MB), gigabyte (GB) with And terabyte (TB) etc. is used as data storage unit.
The tree for the problems such as decision tree is for treatment classification, recurrence and sequence, is by branch node, left subtree And the binary search tree of right subtree composition.Other branch nodes or leaf node are led to from branch node in branch, and to dividing The selection of one of branch is based on the decision made at branch node.Example decision includes to two values (such as characteristic values and attribute Value (also referred to as threshold value) is compared.If characteristic value is less than or equal to threshold value, left subtree is selected;If characteristic value is more than Threshold value then selects right subtree, until the branch of leaf node is led in selection.Leaf node indicates output or the endpoint of decision tree, example Output is the output valve or score for decision tree.
It is using decision tree as the promotion side of basic classification device that boosted tree, which integrates (Boosting tree ensemble) model, Method model comprising more decision trees and by add up more decision trees result come Shared Decision Making.Boosted tree is integrated can be with For carrying out ranking to the document in document searching, gesture identification, speech recognition, data mining and other skills are can be used for Art field.
Invention content
Inventors noted that decision-tree model (such as boosted tree integrated model) is widely used in the application based on web, with Being on the increase for web content, the depth and scale of each tree in decision-tree model also accordingly increase, and this requires a large amount of Computing resource and storage resource carry out operational decisions tree-model.Different from indicating the feature of the node in tree using only bit sequence The conventional method of vector, subject matter described herein is to indicating that the bit sequence of the feature vector of node carries out compressed encoding.It borrows Help the initial position based on specific bit value and the compressed encoding of end position, the processing speed of boosted tree integrated model and deposits Storing up space can be optimised, this is all markedly different from any known scheme in operation principle and mechanism.
In embodiment of the subject matter described herein, it is proposed that a kind of method of compressed encoding for bit sequence and Equipment.After obtaining bit sequence, the initial position and end position that specific bit value occurs in bit sequence are determined, and And it is then based on initial position and end position comes to bit sequential coding.According to embodiment of the subject matter described herein, lead to The initial position for determining the specific bit value in bit sequence and end position are crossed, can realize effective pressure to bit sequence Thus contracting reduces the length of bit sequence.Therefore, it is complicated can not only to reduce calculating for embodiment of the subject matter described herein Degree, and data processing performance can be promoted by parallel processing.
It is their specific realities below in order to which simplified form introduces the selection to concept to provide Summary Applying in mode will be further described.Summary is not intended to identify the key feature of subject matter described herein or main special Sign, is also not intended to limit the range of subject matter described herein.
Description of the drawings
It refers to the following detailed description in conjunction with the accompanying drawings, the above and other feature, advantage and aspect of each embodiment of the disclosure It will be apparent.In the accompanying drawings, same or analogous attached drawing mark indicates same or analogous element, wherein:
Fig. 1 is shown in which that computing system/clothes of one or more embodiments of subject matter described herein can be implemented The block diagram of business device;
Fig. 2 shows the flows for the method to bit sequential coding according to embodiment of the subject matter described herein Figure;
Fig. 3 shows the exemplary plot of the decision tree according to embodiment of the subject matter described herein;
Fig. 4 is shown according to another embodiment of subject matter described herein for method to bit sequential coding Flow chart;
Fig. 5 shows the exemplary plot of the data structure arrangement according to embodiment of the subject matter described herein;
Fig. 6 is shown according to embodiment of the subject matter described herein for being determined parallel for the multiple of multiple input The flow chart of the method for output;
Fig. 7 shows the parallel multiple outputs determined for multiple input according to embodiment of the subject matter described herein Schematic diagram;And
Fig. 8 is shown according to the schematic of the search system for search rank of embodiment of the subject matter described herein Block diagram.
In all the appended drawings, same or similar reference number indicates same or similar element.
Specific implementation mode
Embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although showing the certain of the disclosure in attached drawing Embodiment, it should be understood that, the disclosure can be realized by various forms, and should not be construed as being limited to this In the embodiment that illustrates, it is in order to more thorough and be fully understood by the disclosure to provide these embodiments on the contrary.It should be understood that It is that being given for example only property of the accompanying drawings and embodiments effect of the disclosure is not intended to limit the protection domain of the disclosure.
Terms used herein " comprising " and its deformation are that opening includes, i.e., " include but not limited to ".Term "based" It is " being based at least partially on ".Term " one embodiment " expression " at least one embodiment ";Term " another embodiment " indicates " at least one other embodiment ";Term " some embodiments " expression " at least some embodiments ".The correlation of other terms is fixed Justice provides in will be described below.
Traditionally, decision-tree model is directed to the input received, needs to be traversed for each tree in decision-tree model to generate Output.At each branch node, the characteristic value in input is compared with the threshold value of branch node to determine next node.So And since the operation at child node is executed after operation at its father node, child node, which has its father node, to be controlled Dependence processed.Further, since at each branch node, there are random judging results, thus it is slow to effectively utilize high speed It deposits.
A kind of improvement for decision-tree model is to use the node representation method based on bit sequence, utilizes bit sequence It arranges to indicate the feature vector of each node.In this way, it can ignore the control dependence between node, and only base Output is determined in the bit sequence of certain types of node (for example, false node).However, in this approach, each node Bit sequence length be necessarily equal to tree in leaf node total number.In this way, if tree is on a grand scale, then each bit The length of sequence is also very big, handles these bit sequences so as to cause a large amount of computing resource is needed, and need a large amount of Storage resource store these bit sequences.In addition, if the length of bit sequence is more than the primary institute of processor of computer The maximum length that can be handled, then cannot achieve parallel processing operations.Therefore, traditional when decision-tree model is when being on a grand scale Set traversal method and traditional node representation method based on bit sequence there are long operational time, treatment effeciency is low the problems such as.
For this purpose, subject matter described herein proposes a kind of method and apparatus for bit sequential coding.It is obtaining After bit sequence, the initial position and end position that specific bit value occurs in bit sequence are determined, and be then based on Initial position and end position come to bit sequential coding.According to embodiment of the subject matter described herein, by determining bit The initial position of specific bit value in sequence and end position, can realize and be effectively compressed to bit sequence, thus be reduced The length of bit sequence.Therefore, embodiment of the subject matter described herein can not only reduce computation complexity, and can lead to It crosses parallel processing and promotes data processing performance
The use of byte is unit come to bit sequential coding in addition, according to embodiment of the subject matter described herein, so as to Adapt to the inter-process of computer.Meanwhile when starting byte position is identical with end byte position, be used only start byte and Starting byte position comes to bit sequential coding, being capable of further compression bit sequence.In addition, the reality of subject matter described herein Example is applied for individual node while handling multiple input, and cache is effectively used by vectorial transposition, is achieved in Parallel processing multiple input is so as to the multiple outputs of parallel generation, and therefore, embodiment of the subject matter described herein can be in decision Whole parallelization processing is realized in tree-model.
Below with reference to Fig. 1 to Fig. 8 come the basic principle for illustrating subject matter described herein and several sample implementations.Figure 1 is shown in which that the frame of computing system/server 100 of one or more embodiments of subject matter described herein can be implemented Figure.It should be appreciated that computing system/server 100 shown in figure 1 is only exemplary, without that should constitute to this paper institutes The function of the embodiment of description and any restrictions of range.
As shown in Figure 1, computing system/server 100 is the form of universal computing device.Computing system/server 100 Component can include but is not limited to one or more processors or processing unit 110, memory 120, storage device 130, one Or multiple communication units 140, one or more input equipments 150 and one or more output equipments 160.Processing unit 110 It can be reality or virtual processor and can persistently execute various processing according to what is stored in memory 120.In many places Manage device system in, multiple processing unit for parallel execution computer executable instructions, with improve computing system/server 100 and Row processing capacity.
Computing system/server 100 generally includes multiple computer storage medias.Such medium can calculate system It is the addressable any medium that can be obtained of system/server 100, including but not limited to volatile and non-volatile media, removable It unloads and non-dismountable medium.Memory 120 can be volatile memory (such as register, cache, random access storage Device (RAM)), nonvolatile memory is (for example, read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory) or their certain combination.Storage device 130 can be detachable or non-removable medium, and can To include machine readable media, such as flash drive, disk or any other medium can be used in storing information And/or it data (such as model 170) and can be accessed in computing system/server 100.
Computing system/server 100 may further include other detachable/non-dismountable, volatile, nonvolatile Storage medium.Although not shown in FIG. 1, can provide for being carried out from detachable, non-volatile magnetic disk (such as " floppy disk ") Reading or the disk drive being written and the disc drives for being read out or being written from detachable, anonvolatile optical disk.At this In a little situations, each driving can be connected to bus (not shown) by one or more data media interfaces.Memory 120 can To include one or more program products 125, there are one or more program module collections, these program modules to be configured as Execute the method or function of various embodiments described herein.
The realization of communication unit 140 is communicated by communication media with other computing devices.Additionally, computing system/clothes The function of component of device 100 of being engaged in can realize that these computing machines can lead to single computing cluster or multiple computing machines It crosses and is communicated.Therefore, computing system/server 100 can use and other one or more servers, network The logical connection of personal computer (PC) or another network node is operated in networked environment.
Input equipment 150 can be one or more various input equipments, such as mouse, keyboard, trackball etc..Output is set Standby 160 can be one or more output equipments, such as display, loud speaker, printer etc..Computing system/server 100 is also It can be communicated with one or more external equipment (not shown) as desired by communication unit 140, external equipment is such as Storage device, display equipment etc., with one or more so that the equipment that user interacts with computing system/server 100 is led to Letter, or with make any equipment of computing system/server 100 and other one or more computing device communications (for example, net Card, modem etc.) it is communicated.Such communication can be executed via input/output (I/O) interface (not shown).
As shown in Figure 1, being stored with model 170 in storage device 130 comprising decision-tree model 175 (such as boosted tree collection At model).Computing system/server 100 can receive input 180, such as multiple input { i by input equipment 1500,i1, i2..., each input includes multiple feature { f0,f1,f2,…}.As shown in Figure 1, i0Feature f0Value can be 3.8, i0's Feature f1Value can be 26.4, i0Feature f2Value can be 2.0.Next, computing system/server 100 utilizes decision Tree-model 175 handles input 180, and multiple outputs 190, such as output valve { o are generated by output equipment 1600,o1, o2..., wherein each output corresponds to an input, for example, o0It is i0Output valve, such as 2.0.It is detailed below with reference to Fig. 2-8 It is thin to describe the example embodiment that output 190 is generated based on decision-tree model 175.
Fig. 2 shows be used for the method 200 of bit sequential coding according to embodiment of the subject matter described herein Flow chart.It should be appreciated that method 200 can be executed by the processing unit 110 with reference to described in figure 1.
202, bit sequence is obtained.For example, in the decision-tree model that the node based on bit sequence indicates, Mei Gejie Point has the bit sequence for indicating its feature vector, and one or more of one or more nodes can be obtained from decision-tree model A bit sequence.Bit sequence is made of multiple bits comprising the first bit value (such as 0) and the second bit value (such as 1).For example, an exemplary bit sequence beCan be 11111111110000000111111111111111 comprising 7 the One bit value (i.e. 0) and 25 the second bit values (i.e. 1).
204, the initial position and end position that the first bit value occurs in bit sequence are determined.In some embodiments In, the initial position and end position of first bit value itself can be directly determined, for example, in exemplary bit sequence beIn, the The initial position (such as index) that one bit value itself occurs is 10, and end position is 16.In further embodiments, it originates Position and end position also may indicate that the position of one section of bit residing for the first bit value, for example, initial position can be packet The starting byte position of first the first bit value is included, end position can be the end word for including the last one the first bit value Section is set.Therefore, in exemplary bit sequence beIn, the starting byte position that the first bit value occurs is 1, and the first bit value goes out Existing end position is 2.Although should be appreciated that the position or index of first bit or byte are defined in the examples described above Be 0, however it should be understood by those skilled in the art that it can also be defined as 1, the protection domain of subject matter described herein not by Index the limitation in terms of definition rule.The example that determining initial position and end position are described in detail below with reference to Fig. 4 is implemented Example.
206, it is based at least one of initial position and end position, to bit sequential coding.In some embodiments In, in bit sequence in the case of the first all bit value continued presences, bit sequence beCan be encoded as [10, 16], indicate to be the first bit value between the 10-16 of position, be the second bit value in other positions.In other implementations In example, bit sequence beIt can be encoded as retaining all bits between initial position and end position, for example, bit sequence beIt can be encoded as [10,16,0000000].Alternatively, starting byte position and end position table are indicated in initial position In the case of showing end byte position, bit sequence beCan be encoded as in starting byte position and end byte position It is at least one associated, for example, bit sequence beIt can be based on start byte and starting byte position is encoded.
Although it will be appreciated by those skilled in the art that in method 200 bit sequence indicate decision tree in feature vector, It should be appreciated to those skilled in the art that bit sequence can indicate other data with any meaning.Therefore, by true The initial position of particular value in bit sequence and end position can realize and be effectively compressed to bit sequence, so as to Enough reduce the length of bit sequence.Therefore, embodiment of the subject matter described herein can effectively reduce computation complexity and Reduce memory space.
Fig. 3 shows the exemplary plot of the decision tree 300 according to embodiment of the subject matter described herein.For example, the above ginseng It includes h decision tree { T to examine decision-tree model 175 described in Fig. 10,T1,…Th, every decision tree is represented as Th=(Nh, Lh), wherein Nh={ n0,n1... } and indicate decision tree ThIn | Lh| -1 branch node, Lh={ l0,l1... } and indicate decision tree Th In | Lh| a leaf node.Decision tree 300 can be an example decision tree in decision-tree model 175, as shown in figure 3, decision Tree 300 includes branch node 310,311,312,313,314,315, and wherein branch node 310 is root node, and decision tree 300 is also Including leaf node 320,321,322,323,324,325,326.It should be appreciated that for clear exemplary purpose, in decision tree 300 Only example be include 6 branch nodes and 7 leaf nodes, however may include any number of branch node in decision tree 300 And leaf node, such as may include 255 branch nodes and 256 leaf nodes.
In general, each branch node in decision tree has signature identification (such as { f0,f1,f2... }) and corresponding attribute Value, the attribute value is for determining the true and false property of node, also referred to as threshold value (such as { θ012,…}).In addition, each branch's section Point has the bit sequence for indicating its feature vector, and it can be used to determine the position of the leaf node for input.Each leaf segment Output valve that there are one point tools (such as { e0,e1,e2,…}).For example, branch node 310 has signature identification f0With corresponding threshold value θ0, for a new input i0If its feature f0Value be less than threshold θ0, then branch node 310 is true node (True node);Otherwise, branch node 310 is false node (False node).In traditional tree traversal method, if some branch Node is confirmed as true node, then selects left subtree;Otherwise, then right subtree is selected, and so on, it is traversed always from root node Corresponding leaf node.
In the node representation method based on bit sequence, relied on without the concern for the control between father node and child node Relationship can sequentially, randomly or concurrently determine the true and false property of each branch node in multiple trees.For example, Fig. 3's In example decision tree 300, by the comparison between characteristic value and threshold value, branch node 310,311 and 315 is confirmed as false section Point, other branch nodes are true node.In the node representation method based on bit sequence, pass through the bit to all false nodes Sequence executes logic and operation and is directed to inputting as a result, a such as input pin generates one as a result, so to one tree to generate Afterwards the position of leaf node can be determined to generate output valve or score according to the result.
It should be appreciated that decision tree in each node bit sequence can by the machine learning to training data by It generates, any of or exploitation in the future machine learning method can be used for generating the bit sequence of each node.For example, can All bits in the bit sequence of the result of decision of each tree are initially set to 1, then updated using input sample Bit value in bit sequence, so that can determine leaf according to the position of first " 1 " in the bit sequence of leaf node The position of node.
Since the length of feature vector is equal to the number of all leaf nodes in decision tree, when decision tree is larger When, the length of bit sequence is also longer.According to embodiment of the subject matter described herein, compressed encoding is carried out to bit sequence, So that the length of feature vector no longer needs the number for being equal to tree leaf node.
Fig. 4 shows the method for another embodiment according to subject matter described herein being used for bit sequential coding 400 flow chart.In method 400, action 402-408 can be the son action of the action 204 with reference to described in figure 2 above, Act the son action that 412-414 can be the action 206 with reference to described in figure 2 above.It should be appreciated that method 400 can be by joining Processing unit 110 described in Fig. 1 is examined to execute.
402, determination includes the start byte of the first bit value.For example, can be by above-mentioned bit sequence beIt is divided into more A byte { 11111111,11000000,01111111,11111111 }, then by the first character section including the first bit value { 11000000 } it is determined as start byte.404, the index of start byte is determined as initial position.For example, start byte { 11000000 } index is confirmed as 1.
406, determination includes the end byte of the first bit value.For example, in bit sequence be{11111111, 11000000,01111111,11111111 } in, the last byte { 01111111 } including the first bit value is determined as End byte.408, the index of end byte is determined as end position.For example, the index quilt of end byte { 01111111 } It is determined as 2.
410, judge whether initial position is identical as end position.If identical, it is based on start byte 412 and rises Beginning position versus spy sequential coding, this first bit value of explanation is all in the same byte, because without reusing end word Section and end position come to bit sequential coding.If it is different, then 414, based on start byte, initial position, end byte And end position, to bit sequential coding.Therefore, according to the method for subject matter described herein 400, in starting byte position When identical with end byte position, come to bit sequential coding using only start byte and starting byte position, it can be further Compression bit sequence.
In one embodiment, such as the feature vector of node can be the bit sequence b of 256 bitsf(11… 1100 ... 0011 ... 11), and wherein the bit sequence is made of 3 parts, that is, be followed successively by 81 continuous 1,17 continuous 0, And 158 continuous 1.According to approach described herein 400, it may be determined that the start byte including first 0 is 10000000, initial position 10, end byte 00111111, and stop bits set 12.Therefore, master described herein The embodiment of topic is come using only start byte, end byte, initial position and end position to bit sequential coding.Namely It says, it is only necessary to A byte indicates encoded bit sequence, and wherein L indicates the leaf segment in decision tree The total number of point,Indicate the byte location shared by initial position or end position,Expression rounds up symbol. In this example, the value of L be equal to bit sequence length, as 256, thus encoded bit sequence only have about 4 bytes Length, effectively have compressed original bit sequence.
In general, single-instruction multiple-data stream (SIMD) (SIMD) has the length of multiple bits so as to the multiple bit datas of single treatment, Such as streaming SIMD extensions (SSE) register supports the instruction of 128 bits, advanced vector extension (AVX) register to support 256 bits Instruction.In traditional method, since bit sequence length is identical as the sum of leaf node, thus in the length of bit sequence In the case of reaching certain value (for example, more than 64 bits in SSE registers, more than 128 bits in AVX registers), just It cannot achieve parallel processing.However, according to embodiment of the subject matter described herein, by carrying out compression volume to bit sequence Code so that the length of encoded bit sequence is relatively short, is achieved in and concurrently handles more bit sequences.According to this The embodiment of theme described in text, by the compressed encoding and Corner turn memory of bit sequence, SSE registers for example can be simultaneously 16 bit sequences are handled, AVX registers can handle 32 bit sequences simultaneously, without by original bit sequence length shadow It rings.In addition, in the register for supporting 512 bits (such as AVX-512), subject matter described herein is at best able to handle simultaneously 64 bit sequences.
Fig. 5 shows the exemplary plot of the data structure arrangement 500 according to embodiment of the subject matter described herein.For certainly Multiple trees in plan tree-model, can be and big according to threshold value by the multiple combination of nodes identified with same characteristic features together Small ascending order arrangement.Since the number of different characteristic mark is different, thus need to indicate each signature identification using offset index Initial position.As shown in figure 5, offset index 510 is for recording the corresponding initiation threshold position of each signature identification in threshold value 520 It sets, for example, being indexed by the first two in offset index, it may be determined that signature identification f0Threshold value range.As shown in figure 5, Each threshold value correspond to respective nodes tree mark 530 and encoded bit sequence 540 (for example, according to described herein The encoded bit sequence of method 200 or 400).For example, optionally, the length of each encoded bit sequence can be by It is set asWherein L indicates the sum of the leaf node in tree.Alternatively, in starting byte position and In the case of end byte position is identical, the length of each encoded bit sequence can be set to 1+
After comparison step between characteristic value and threshold value is completed, all false sections therein can be determined for each tree Then the encoded bit sequence of all false nodes is carried out logical AND operation, to generate result sequence by point.As shown in figure 5, For the input received, the h result 550 set for h can be generated, can be determined corresponding thereto according to each result Leaf node position to generate output valve.As shown in figure 5, each leaf node of every decision tree has corresponding leaf node Output valve 560.
According to embodiment of the subject matter described herein, by the characteristic value of input and corresponding threshold value comparison, and institute is recorded The tree mark and bit sequence for having false node, then carry out logical AND operation to the bit sequence of all vacation nodes in each tree, To obtain result corresponding with each inputting.Then, output valve come the position of leaf node and is generated based on result determination Or score.
There are a large amount of decision trees in decision-tree model 175, and each decision tree includes multiple branch nodes, Ke Yifa again Now there is the node largely repeated, i.e. the signature identification of node and threshold value is all identical.In some embodiments, it may be determined that certainly Multiple nodes with same characteristic features mark and both same thresholds in plan tree, the true and false property for being then combined with multiple nodes determine behaviour Make.That is, for the identical multiple nodes of signature identification and threshold value, it is only necessary to execute a feature for these nodes The comparison of value and threshold value operates, and is once relatively operated without being directed to each node execution, which thereby enhances the place of decision tree Manage speed.It should be appreciated that decision-tree model is bigger and the node repeated be more, performance boost caused by node duplicate removal It is more apparent.Further, since the bit sequence of each node is not necessarily identical, so all identical multiple sections of signature identification and threshold value The comparison that point only merges between characteristic value and threshold value operates, and logical AND of the nonjoinder for definitive result operates.
In some embodiments, the threshold value 520 with reference to shown in figure 5, in the multiple nodes that will have like-identified After multiple threshold value ascending sorts, if characteristic value associated with signature identification is less than some specific threshold, it can terminate This feature value and the comparison for being more than those of specific threshold threshold value in multiple threshold values.It only needs to find this is because comparing operation False node in decision tree, and if characteristic value is less than specific threshold, characteristic value necessarily compares less than more than the specific threshold Other threshold values.It therefore, can be with by the way that according to signature identification, by multiple threshold value ascending sorts, and only execution part compares operation Improve the processing speed of decision tree.
In some embodiments, the access speed of data can be improved by pre-fetching data into cache.One A little embodiments can improve the effect of memory hierarchy using piecemeal.For example, can be by the whole in decision-tree model 175 Decision tree is divided into several pieces so that each block can be placed in the third level in processing unit (L3) cache.
According to embodiment of the subject matter described herein, by changing to the node representation method based on bit vectors Into the length of bit sequence can be reduced, this brings condition to the parallel processing based on vectorization.For example, in SSE registers In, it can be operated with the logical AND of the multiple bit sequences of parallel processing.
In some embodiments, decision tree can receive multiple input (for example, input { i simultaneously0,i1,i2...), then Multiple input is handled for a nodal parallel, to improve parallel processing efficiency.For example, the threshold value of a node can be put It sets into a SSE register, then the multiple corresponding inputs received is put into another SSE register, in order to It is sequentially processed multiple input, multiple input can be polymerize according to signature identification so that same characteristic features identify multiple Characteristic value is continuously stored.Then, for each input, concurrently determine that the node is true node or false node.One In a little embodiments, the information of the bit sequence of false node can be stored using intermediate vector.
In some embodiments, in a decision tree it is all vacation node logical with operation by generate be directed to it is multiple It, can be to indicating that the vector of multiple results carries out transposition so that in multiple results at same position after multiple results of input Multiple bytes can continuously be stored in the caches, improve the processing of decision tree from there through cache is used Speed.For example, the following table 1, which is shown, indicates multiple result { v0,v1,v2... vector, it can be seen that multiple words of each result Section (for example,) continuously stored.The following table 2 shows the vector after transposition, wherein in each result Multiple bytes are not by Coutinuous store.On the contrary, multiple bytes in multiple results at same position are continuously stored, for example, Multiple results first character section (for example,) by Coutinuous store, to promote to be used for subsequent parallel processing Multiple results.
Table 1:Vector before transposition indicates
Table 2:Vector after transposition indicates
Fig. 6 is shown according to embodiment of the subject matter described herein for being determined parallel for the multiple of multiple input The flow chart of the method 600 of output.It should be appreciated that the action 602-608 in method 600 can be in the side with reference to described in figure 2 It is executed after action 206 in method 200 or the action 414 with reference to described in figure 4.It should be appreciated that method 600 can be by reference chart Processing unit 110 described in 1 executes.
602, multiple start bytes that the second bit value (for example, 1) occurs in multiple results through transposition are determined, and And in multiple indexes of the 604 multiple start bytes of determination.Next, 606, determine the second bit value in multiple start bytes The multiple initial bits positions occurred.608, multiple indexes based on multiple start bytes and multiple initial bits positions, really Surely multiple outputs of multiple input are directed to.Since the position of first the second bit value in each result can be used in determining leaf The position of node, therefore by determining that the position of the second bit value can determine multiple outputs for multiple input.
It is determined for the more of multiple input according to the parallel of embodiment of the subject matter described herein for example, Fig. 7 is shown The schematic diagram 700 of a output, the data in Fig. 7 are hexadecimal data.For example, decision-tree model 175 generates m result {v0,v1,v2…vm-1, then to indicate this m result vector carry out transposition, generate multiple results 710 through transposition, wherein Multiple bytes in multiple results at same position are by Coutinuous store, such as v0In first character section 03, v1In first character Section 00, v2In first character section 00 ... and vm-1In first character section 5B continuously stored.
Can concurrently determine first non-zero byte 720 in multiple results, for example, obtain first data sequence (03, 00,00 ... 5B), then obtain data sequence (4A, 00,69 ... 72), and so on.Meanwhile determining first non-zero byte Index i1730.Next, determining the index i of first non-zero bit in first non-zero byte 7202740, then may be used To be based on index i1730 and index i2740 determine the index 750 of multiple first non-zero bit in multiple result sequences (for example, i0=8*i1+i2).The position of multiple leaf nodes is determined based on the index of first non-zero bit, so as to determination For multiple outputs of multiple input.
According to embodiment of the subject matter described herein, cache is used by vectorial transposition so as to be processed Data by Coutinuous store, improve processing speed.In addition, using bit sequence to carry out parallel processing more for subject matter described herein A input can realize the whole parallelization of the data processing in decision-tree model.Therefore, the implementation of subject matter described herein Example can generate multiple outputs for multiple input simultaneously by vectorization scheme, improve the treatment effeciency of decision tree.
Fig. 8 shows showing according to the search system 800 for search rank of embodiment of the subject matter described herein Meaning property block diagram.As shown in figure 8, search system 800 includes document ordering device 810 and document database 820, wherein document ordering device 810 by the user received inquiry 810 and multiple documents in document database 820 for carrying out relevant matches ranking, document Database 820 includes multiple documents, and the content of document can be webpage, text file, image file, etc..Document ordering device 810 can generate N number of (such as two or more) inquiry-document pair for inquiry 830, and also referred to as (Q, D) is right, and every A (Q, D) is to having corresponding feature set, such as multiple feature { f as described above0,f1,f2... } etc..
As shown in figure 8, search system 800 further includes decision-tree model 175 comprising have multiple decision trees, decision tree Model 175 receive N number of (Q, D) to feature set, and using according to the method for embodiment described herein come to handling this A little feature sets, to generate N number of output score 840.For example, the 1st document, which is scored at the 2.0, the 2nd document, is scored at 1.4, Etc..Then, search system 800 can be ranked up score 840 again, and export document column to user according to ranking results Table.It therefore, can be more quickly high by using decision-tree model according to the search system of the embodiment of subject matter described herein Effect ground generates handling result.
Method and function described herein can be executed by one or more hardware logic components at least partly. Such as, but not limited to, the exemplary types for the hardware logic component that can be used include field programmable gate array (FPGA), specially With integrated circuit (ASIC), Application Specific Standard Product (ASSP), system on chip (SOC), Complex Programmable Logic Devices (CPLD) etc..
One or more programming languages may be used in the program code of method for implementing subject matter described herein Any combinations are write.These program codes can be supplied at all-purpose computer, special purpose computer or other programmable datas Manage the processor or controller of device so that program code makes flowchart and or block diagram when by processor or controller execution Function/operation of middle defined is carried out.Program code can execute completely on machine, partly execute on machine, make For independent software package partly on machine execute and partly on the remote machine execute or completely in remote machine or service It is executed on device.
In the context of present disclosure, machine readable media can be tangible medium, can include or store The program for using or being used in combination with instruction execution system, device or equipment for instruction execution system, device or equipment.Machine Device readable medium can be machine-readable signal medium or machine-readable storage medium.Machine readable media may include but unlimited In times of electronics, magnetic, optical, electromagnetism, infrared or semiconductor system, device or equipment or the above What appropriate combination.The more specific example of machine readable storage medium will include being electrically connected, being portable for line based on one or more Formula computer disks, hard disk, random access memory (RAM), read-only memory (ROM), Erasable Programmable Read Only Memory EPROM (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage are set Standby or the above any appropriate combination.
Although in addition, depicting each operation using certain order, this should be understood as requirement operation in this way with shown The certain order that goes out executes in sequential order, or requires the operation of all diagrams that should be performed to obtain desired result. Under certain environment, it may be advantageous for multitask and parallel processing.Similarly, although containing several tools in being discussed above Body realizes details, but these are not construed as the limitation to the range of subject matter described herein.Individually realizing Context described in certain features can also realize in combination in single realize.On the contrary, above and below individually realizing The various features of described in the text can also individually or in any suitable subcombination be realized in multiple realizations.
It is listed below some sample implementations of the disclosure.
In one aspect, a kind of electronic equipment is provided.The electronic equipment includes:Processing unit;Memory is coupled to place It manages unit and is stored with instruction, instruction executes following action when being executed by processing unit:Obtain bit sequence, bit sequence Including the first bit value and the second bit value;Determine the initial position and stop bits that the first bit value occurs in bit sequence It sets;And it is based at least one of initial position and end position, to bit sequential coding.
In some embodiments, wherein bit sequence is divided into one or more bytes, and determines the first bit value The initial position and end position occurred in bit sequence includes:Include the of the first bit value by one or more bytes One byte is determined as start byte;The index of start byte is determined as initial position;Include by one or more bytes The last byte of first bit value is determined as end byte;And the index of end byte is determined as end position.
In some embodiments, wherein including to bit sequential coding:Based on start byte, initial position, end byte And end position, to bit sequential coding.
In some embodiments, wherein including to bit sequential coding:It is identical as end position in response to initial position, base Come to bit sequential coding in start byte and initial position.
In some embodiments, the feature vector of the node wherein in bit sequence expression tree, and node has feature Mark and the attribute value for determining the true and false property of node, the true and false property of node are based on characteristic value associated with signature identification and attribute The comparison of value and be determined.
In some embodiments, action further includes:Determine multiple nodes with both signature identification and attribute value in tree; And merge the true and false property determination operation of node of multiple nodes.
In some embodiments, action further includes:Multiple attributes of multiple nodes with signature identification in ascending order Value;And it is less than the particular attribute-value in multiple attribute values in response to characteristic value associated with signature identification, terminate characteristic value With the comparison of attribute value in multiple attribute values, more than particular attribute-value.
In some embodiments, action further includes:Multiple input is received, the input in multiple input includes one or more Characteristic value;And handle multiple input for nodal parallel.
In some embodiments, action further includes:By determining the true and false property of node of the node in setting, to generate for more Multiple results of a input;And to indicating that the vector of multiple results carries out transposition, at the same position in multiple results Multiple bytes be consecutively stored in cache.
In some embodiments, action further includes:It is more to determine that the second bit value occurs in multiple results through transposition A start byte;Determine multiple indexes of multiple start bytes;Determine that the second bit value occurs more in multiple start bytes A initial bits position;And multiple indexes based on multiple start bytes and multiple initial bits positions, determination are directed to multiple Multiple outputs of input.
On the other hand, a method of computer implementation is provided.This method includes:Obtain bit sequence, bit sequence Row include the first bit value and the second bit value;Determine the initial position and stop bits that the first bit value occurs in bit sequence It sets;And it is based at least one of initial position and end position, to bit sequential coding.
In some embodiments, wherein bit sequence is divided into one or more bytes, and determines the first bit value The initial position and end position occurred in bit sequence includes:Include the of the first bit value by one or more bytes One byte is determined as start byte;The index of start byte is determined as initial position;Include by one or more bytes The last byte of first bit value is determined as end byte;And the index of end byte is determined as end position.
In some embodiments, wherein including to bit sequential coding:Based on start byte, initial position, end byte And end position, to bit sequential coding.
In some embodiments, wherein including to bit sequential coding:It is identical as end position in response to initial position, base Come to bit sequential coding in start byte and initial position.
In some embodiments, the feature vector of the node wherein in bit sequence expression tree, and node has feature Mark and the attribute value for determining the true and false property of node, the true and false property of node are based on characteristic value associated with signature identification and attribute The comparison of value and be determined.
In some embodiments, further include:Determine multiple nodes with both signature identification and attribute value in tree;And The true and false property of node for merging multiple nodes determines operation.
In some embodiments, further include:Multiple attribute values of multiple nodes with signature identification in ascending order;With And in response to characteristic value associated with signature identification be less than multiple attribute values in particular attribute-value, terminate characteristic value with it is multiple The comparison of attribute value in attribute value, more than particular attribute-value.
In some embodiments, further include:Multiple input is received, the input in multiple input includes one or more features Value;And handle multiple input for nodal parallel.
In some embodiments, further include:By determining the true and false property of node of the node in setting, to generate for multiple defeated The multiple results entered;And to indicating that the vector of multiple results carries out transposition, so as to more at the same position in multiple results A byte is consecutively stored in cache.
In some embodiments, further include:Determine multiple that the second bit value occurs in multiple results through transposition Beginning byte;Determine multiple indexes of multiple start bytes;And determine the second bit value occur in multiple start bytes it is more A initial bits position;And multiple indexes based on multiple start bytes and multiple initial bits positions, determination are directed to multiple Multiple outputs of input.
It yet still another aspect, providing a kind of computer program product.The computer program product is stored in non-transient meter In calculation machine storage medium and include machine-executable instruction, equipment is made when machine-executable instruction is run in a device:It obtains Bit sequence is obtained, bit sequence includes the first bit value and the second bit value;Determine that the first bit value occurs in bit sequence Initial position and end position;And it is based at least one of initial position and end position, to bit sequential coding.
In some embodiments, wherein bit sequence is divided into one or more bytes, and determines the first bit value The initial position and end position occurred in bit sequence includes:Include the of the first bit value by one or more bytes One byte is determined as start byte;The index of start byte is determined as initial position;Include by one or more bytes The last byte of first bit value is determined as end byte;And the index of end byte is determined as end position.
In some embodiments, wherein including to bit sequential coding:Based on start byte, initial position, end byte And end position, to bit sequential coding.
In some embodiments, wherein including to bit sequential coding:It is identical as end position in response to initial position, base Come to bit sequential coding in start byte and initial position.
In some embodiments, the feature vector of the node wherein in bit sequence expression tree, and node has feature Mark and the attribute value for determining the true and false property of node, the true and false property of node are based on characteristic value associated with signature identification and attribute The comparison of value and be determined.
In some embodiments, equipment is also made when machine-executable instruction is run in a device:Determining has spy in tree Multiple nodes of sign mark and both attribute values;And merge the true and false property determination operation of node of multiple nodes.
In some embodiments, equipment is also made when machine-executable instruction is run in a device:Have in ascending order Multiple attribute values of multiple nodes of signature identification;And it is less than multiple attributes in response to characteristic value associated with signature identification Particular attribute-value in value terminates the comparison of characteristic value and attribute value in multiple attribute values, more than particular attribute-value.
In some embodiments, equipment is also made when machine-executable instruction is run in a device:Multiple input is received, it is more Input in a input includes one or more features value;And handle multiple input for nodal parallel.
In some embodiments, equipment is also made when machine-executable instruction is run in a device:By determining in tree The true and false property of node of node, to generate multiple results for multiple input;And to indicating that the vector of multiple results turns It sets, so that multiple bytes at the same position in multiple results are consecutively stored in cache.
In some embodiments, equipment is also made when machine-executable instruction is run in a device:Determine the second bit value The multiple start bytes occurred in multiple results through transposition;Determine multiple indexes of multiple start bytes;Determine the second ratio Multiple initial bits positions that paricular value occurs in multiple start bytes;And multiple indexes based on multiple start bytes and more A initial bits position determines multiple outputs for multiple input.
Although having used specific to the language description of the structure feature and/or method logical action disclosure, answer When understanding that the theme defined in the appended claims is not necessarily limited to special characteristic described above or action.On on the contrary, Special characteristic described in face and action are only to realize the exemplary forms of claims.

Claims (20)

1. a kind of electronic equipment, including:
Processing unit;
Memory is coupled to the processing unit and is stored with instruction, and described instruction is held when being executed by the processing unit The following action of row:
Bit sequence is obtained, the bit sequence includes the first bit value and the second bit value;
Determine the initial position and end position that first bit value occurs in the bit sequence;And
Based at least one of the initial position and the end position, the bit sequence is encoded.
2. equipment according to claim 1 wherein the bit sequence is divided into one or more bytes, and determines The initial position and end position that first bit value occurs in the bit sequence include:
The first character section for by one or more of bytes including first bit value is determined as start byte;
The index of the start byte is determined as the initial position;
The last byte for by one or more of bytes including first bit value is determined as end byte;And
The index of the end byte is determined as the end position.
3. equipment according to claim 2, wherein including to bit sequence coding:
Based on the start byte, the initial position, the end byte and the end position, to the bit sequence Coding.
4. equipment according to claim 2, wherein including to bit sequence coding:
It is identical as the end position in response to the initial position, based on the start byte and the initial position come to institute State bit sequence coding.
5. equipment according to claim 1, wherein the bit sequence indicates the feature vector of the node in tree, and institute Stating node, there is signature identification and the attribute value for determining the true and false property of node, the true and false property of node to be based on and the feature mark The comparison of sensible associated characteristic value and the attribute value and be determined.
6. equipment according to claim 5, the action further include:
Determine multiple nodes with both the signature identification and the attribute value in the tree;And
The true and false property of node for merging the multiple node determines operation.
7. equipment according to claim 5, the action further include:
Multiple attribute values of multiple nodes with the signature identification in ascending order;And
It is less than the particular attribute-value in the multiple attribute value in response to characteristic value associated with the signature identification, terminates institute State the comparison of characteristic value and attribute value in the multiple attribute value, more than the particular attribute-value.
8. equipment according to claim 5, the action further include:
Multiple input is received, the input in the multiple input includes one or more features value;And
For the multiple input of nodal parallel processing.
9. equipment according to claim 8, the action further include:
By the true and false property of the node of the node in the determination tree, to generate multiple results for the multiple input;And
To indicating that the vector of the multiple result carries out transposition so that multiple bytes at the same position in the multiple result It is consecutively stored in cache.
10. equipment according to claim 9, the action further include:
Determine multiple start bytes that second bit value occurs in the multiple result through transposition;
Determine multiple indexes of the multiple start byte;
Determine multiple initial bits positions that second bit value occurs in the multiple start byte;And
The multiple index based on the multiple start byte and the multiple initial bits position are determined for the multiple Multiple outputs of input.
11. a method of computer implementation, including:
Bit sequence is obtained, the bit sequence includes the first bit value and the second bit value;
Determine the initial position and end position that first bit value occurs in the bit sequence;And
Based at least one of the initial position and the end position, the bit sequence is encoded.
12. according to the method for claim 11, wherein the bit sequence is divided into one or more bytes, and really The initial position and end position that fixed first bit value occurs in the bit sequence include:
The first character section for by one or more of bytes including first bit value is determined as start byte;
The index of the start byte is determined as the initial position;
The last byte for by one or more of bytes including first bit value is determined as end byte;And
The index of the end byte is determined as the end position.
13. according to the method for claim 12, wherein including to bit sequence coding:
Based on the start byte, the initial position, the end byte and the end position, to the bit sequence Coding.
14. according to the method for claim 12, wherein including to bit sequence coding:
It is identical as the end position in response to the initial position, based on the start byte and the initial position come to institute State bit sequence coding.
15. according to the method for claim 11, wherein the bit sequence indicates the feature vector of the node in tree, and There is the node signature identification and the attribute value for determining the true and false property of node, the true and false property of node to be based on and the feature It identifies the comparison of associated characteristic value and the attribute value and is determined.
16. according to the method for claim 15, further including:
Multiple attribute values of multiple nodes with the signature identification in ascending order;And
It is less than the particular attribute-value in the multiple attribute value in response to characteristic value associated with the signature identification, terminates institute State the comparison of characteristic value and attribute value in the multiple attribute value, more than the particular attribute-value.
17. according to the method for claim 15, further including:
Multiple input is received, the input in the multiple input includes one or more features value;And
For the multiple input of nodal parallel processing.
18. according to the method for claim 17, further including:
By the true and false property of the node of the node in the determination tree, to generate multiple results for the multiple input;And
To indicating that the vector of the multiple result carries out transposition so that multiple bytes at the same position in the multiple result It is consecutively stored in cache.
19. according to the method for claim 18, further including:
Determine multiple start bytes that second bit value occurs in the multiple result through transposition;
Determine multiple indexes of the multiple start byte;And
Determine multiple initial bits positions that second bit value occurs in the multiple start byte;And
The multiple index based on the multiple start byte and the multiple initial bits position are determined for the multiple Multiple outputs of input.
20. a kind of computer program product, the computer program product is stored in non-transitory, computer storage medium simultaneously And include machine-executable instruction, the machine-executable instruction makes the equipment when running in a device:
Bit sequence is obtained, the bit sequence includes the first bit value and the second bit value;
Determine the initial position and end position that first bit value occurs in the bit sequence;And
Based at least one of the initial position and the end position, the bit sequence is encoded.
CN201710044988.5A 2017-01-20 2017-01-20 Compression coding for bit sequences Active CN108334888B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201710044988.5A CN108334888B (en) 2017-01-20 2017-01-20 Compression coding for bit sequences
PCT/US2018/013750 WO2018136371A1 (en) 2017-01-20 2018-01-16 Compressed encoding for bit sequence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710044988.5A CN108334888B (en) 2017-01-20 2017-01-20 Compression coding for bit sequences

Publications (2)

Publication Number Publication Date
CN108334888A true CN108334888A (en) 2018-07-27
CN108334888B CN108334888B (en) 2022-03-11

Family

ID=61168159

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710044988.5A Active CN108334888B (en) 2017-01-20 2017-01-20 Compression coding for bit sequences

Country Status (2)

Country Link
CN (1) CN108334888B (en)
WO (1) WO2018136371A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111158583A (en) * 2019-11-25 2020-05-15 深圳壹账通智能科技有限公司 Boolean type variable storage method, device, equipment and storage medium for graph calculation

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11216431B2 (en) * 2017-09-12 2022-01-04 Apple Inc. Providing a compact representation of tree structures

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101211346A (en) * 2006-12-30 2008-07-02 上海意渊信息科技有限公司 Method for optimizing memorizer performance
CN101345707A (en) * 2008-08-06 2009-01-14 北京邮电大学 Method and apparatus for implementing IPv6 packet classification
CN101689204A (en) * 2007-07-03 2010-03-31 新叶股份有限公司 Bit string search method and program
US20120119927A1 (en) * 2010-08-24 2012-05-17 Huawei Technologies Co., Ltd. Methods and devices for creating, compressing and searching binary tree
CN102783154A (en) * 2010-02-26 2012-11-14 捷讯研究有限公司 Encoding and decoding methods and devices employing dual codesets
CN103037218A (en) * 2012-10-22 2013-04-10 北京航空航天大学 Multi-view stereoscopic video compression and decompression method based on fractal and H.264
CN105574821A (en) * 2015-12-10 2016-05-11 浙江传媒学院 Data-based soft shadow removal method
CN106056752A (en) * 2016-05-25 2016-10-26 武汉大学 Banknote authentication method based on random forest
CN106294355A (en) * 2015-05-14 2017-01-04 阿里巴巴集团控股有限公司 A kind of determination method and apparatus of business object attribute

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101211346A (en) * 2006-12-30 2008-07-02 上海意渊信息科技有限公司 Method for optimizing memorizer performance
CN101689204A (en) * 2007-07-03 2010-03-31 新叶股份有限公司 Bit string search method and program
CN101345707A (en) * 2008-08-06 2009-01-14 北京邮电大学 Method and apparatus for implementing IPv6 packet classification
CN102783154A (en) * 2010-02-26 2012-11-14 捷讯研究有限公司 Encoding and decoding methods and devices employing dual codesets
US20120119927A1 (en) * 2010-08-24 2012-05-17 Huawei Technologies Co., Ltd. Methods and devices for creating, compressing and searching binary tree
CN103037218A (en) * 2012-10-22 2013-04-10 北京航空航天大学 Multi-view stereoscopic video compression and decompression method based on fractal and H.264
CN106294355A (en) * 2015-05-14 2017-01-04 阿里巴巴集团控股有限公司 A kind of determination method and apparatus of business object attribute
CN105574821A (en) * 2015-12-10 2016-05-11 浙江传媒学院 Data-based soft shadow removal method
CN106056752A (en) * 2016-05-25 2016-10-26 武汉大学 Banknote authentication method based on random forest

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HTTP://RANGER.UTA.EDU/~KOSMOPO/CSE2320/LECTURES/15-DATACOMPRESSI: "5.5 Data Compression", 《HTTP://RANGER.UTA.EDU/~KOSMOPO/CSE2320/LECTURES/15-DATACOMPRESSION.PDF》 *
阮晓宏 等: "基于异构代价敏感决策树的分类器算法", 《计算机科学》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111158583A (en) * 2019-11-25 2020-05-15 深圳壹账通智能科技有限公司 Boolean type variable storage method, device, equipment and storage medium for graph calculation

Also Published As

Publication number Publication date
WO2018136371A1 (en) 2018-07-26
CN108334888B (en) 2022-03-11

Similar Documents

Publication Publication Date Title
CN111177569B (en) Recommendation processing method, device and equipment based on artificial intelligence
JP7126542B2 (en) DATASET PROCESSING METHOD, APPARATUS, ELECTRONIC DEVICE AND STORAGE MEDIUM
CN111444320B (en) Text retrieval method and device, computer equipment and storage medium
CN106852185B (en) Parallelly compressed encoder based on dictionary
CN111709243A (en) Knowledge extraction method and device based on deep learning
CN112818093B (en) Evidence document retrieval method, system and storage medium based on semantic matching
CN110781306B (en) English text aspect layer emotion classification method and system
US11763583B2 (en) Identifying matching fonts utilizing deep learning
CN104471552A (en) Methods and systems for handling data received by a state machine engine
JP7096919B2 (en) Entity word recognition method and device
CN111325245B (en) Repeated image recognition method, device, electronic equipment and computer readable storage medium
US11573994B2 (en) Encoding entity representations for cross-document coreference
CN112789626A (en) Scalable and compressed neural network data storage system
CN110795935A (en) Training method and device for character word vector model, terminal and storage medium
US20190318191A1 (en) Noise mitigation in vector space representations of item collections
CN111400584A (en) Association word recommendation method and device, computer equipment and storage medium
Kim et al. Adaptive compression of word embeddings
US11373043B2 (en) Technique for generating and utilizing virtual fingerprint representing text data
US20220414400A1 (en) Multi-dimensional language style transfer
CN108334888A (en) For the compressed encoding of bit sequence
US20220277031A1 (en) Guided exploration for conversational business intelligence
CN113342977A (en) Invoice image classification method, device, equipment and storage medium
CN116402166B (en) Training method and device of prediction model, electronic equipment and storage medium
CN109902162B (en) Text similarity identification method based on digital fingerprints, storage medium and device
CN115357720A (en) Multi-task news classification method and device based on BERT

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant