CN104050230B - The fast method of minimum and maximum value in large data sets is searched using SIMD instruction collection framework - Google Patents
The fast method of minimum and maximum value in large data sets is searched using SIMD instruction collection framework Download PDFInfo
- Publication number
- CN104050230B CN104050230B CN201410096786.1A CN201410096786A CN104050230B CN 104050230 B CN104050230 B CN 104050230B CN 201410096786 A CN201410096786 A CN 201410096786A CN 104050230 B CN104050230 B CN 104050230B
- Authority
- CN
- China
- Prior art keywords
- data
- value
- data set
- unit
- data unit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/22—Arrangements for sorting or merging computer data on continuous record carriers, e.g. tape, drum, disc
Abstract
This disclosure relates to the fast method of minimum and maximum value in large data sets is searched using SIMD instruction collection framework.Single-instruction multiple-data can be used in system and method(SIMD)Instruction set architecture and the special data of array entries layout and with determination determined by association index in border value data unit to large data sets concurrently determine border value data unit in the large data sets.In one example, the special data layout of the array entries includes data value being combined in single array entries to the association index of array with it.
Description
Technical field
Embodiment described herein the data processing related generally to for large data sets, and relate more particularly to use
Single-instruction multiple-data(SIMD)Processor handles large data sets.
Background technology
Single-instruction multiple-data(SIMD)Processor is generally used without multiple in the parallel application of mass data is showed
Miscellaneous controlling stream or excessive inter-processor communication.It may include lower-level vision and image procossing for the typical case of SIMD processor,
Such as pattern-recognition, database search and statistical analysis.The common operation involved in image procossing be to look for it is minimum or
Maximum or to its association index in big data array.Most SIMD processor, which provides, can quickly perform minimum and maximum
The instruction of operation.However, if SIMD processor must keep the index of these values of tracking generation, the data parallel of SIMD instruction
It can be destroyed.
Brief description of the drawings
The various advantages of embodiments of the invention are by by reading following explanation and the claim enclosed and passing through ginseng
Examine following figure and obvious are become to those skilled in that art, wherein:
Fig. 1 is the block diagram according to the example of the computing system of embodiment;
Fig. 2A -2B are the large data sets and the figure of the example of the data layout of the large data sets according to embodiment;
Fig. 3 is to determine data boundary value cell and the flow chart of the example of the method for index associated according to embodiment;With
And
Fig. 4 A-4B are the figures according to the SIMD instruction list of embodiment and the example of block diagram.
Fig. 5 is the block diagram according to the system of embodiment;And
Fig. 6 is the figure according to the device of embodiment.
Embodiment
Turning now to Fig. 1, computing system 100 is shown, it includes CPU(CPU)120th, system storage 130,
Storage device 140(Including database 150), graphics processing unit(GPU)160 and graphic memory 170.The system 100 of diagram
Can be a part for mobile platform, such as laptop computer, personal digital assistant(PDA), intelligent wireless phone, media play
Device, imaging device, mobile Internet device(MID), Intelligent flat computer etc. or its any combinations.System 100 can also be example
Such as personal computer(PC), server, a part for the fixed platform such as work station.
CPU 120 may include Memory Controller(It is not shown), it provides the access to system storage 130, the system
Memory 130 may include random access memory, such as double data rate(DDR)Synchronous Dynamic Random Access Memory module.
The module of system storage 130 may be incorporated into single memory module in upright arrangement(SIMM), dual inline memory module(DIMM), it is small-sized
DIMM(SODIMM)Deng in.CPU 120 can also have one or more drivers and/processor core(It is not shown), wherein each
Core can have instruction fetch unit, instruction decoder, one-level multiple functionally(L1)Cache, execution unit, etc..CPU can be wrapped
Include one or more single-instruction multiple-datas(SIMD)Processor core.CPU 120 can also carry out operating system(OS), such as
Microsoft Windows, Linux or Mac (Macintosh) OS.
Storage device 140 can be realized with a variety of parts or subsystem, including such as disc driver, CD-ROM driver, flash
Memory or other devices for being capable of lasting storage information.As shown in Fig. 1, storage device 140 includes database 150,
It stores large data sets.
The system 100 of diagram also includes graphics processing unit(GPU)160, it is coupled in graphic memory 170.Special purpose chart
Shape memory 170 may include GDDR(D graphics DR)Or DDR SDRAM modules, or be suitable for support figure render it is any its
His memory technology.GPU 160 and graphic memory 170 can be arranged on graphics/video card, and wherein GPU 160 can be via example
Such as PCI Express Graphics(PEG, such as peripheral parts interconnected/PCI Express x16 figure 15W-ATX specifications
1.0th, PCI special interesting groups)Bus or AGP(For example, AGP V3.0 interface specifications, in September, 2002)Bus etc. is schemed
Shape bus and communicated with CPU 120.Graphics card can be integrated on system board, to the chip of host CPU 120(die)It is interior, be configured to
Separate cards on motherboard etc..
As figure application a part, it is illustrated that GPU 160 perform software module.Figure application can be it needs to be determined that big number
Minimum or maximum according to concentration and its association index to big data array.In one example, software module includes generation
Code, for concurrently determining minimum or maximum in large data sets with determination value to the association index in big data array.
Software module may also include code, for the association index of data value and the data value to be combined into individual data list
It is used for the Data Entry being stored as in large data sets in first.Software module can use such as Object-Oriented Programming Language(Such as C++)
Write etc. any programming language.
GPU 160 may also include one or more single-instruction multiple-datas(SIMD)Processor core, for improving and/or supporting
Graphics performance.So as to, it is illustrated that method can involve high-level data parallel and processing complexity graphics environment in especially have
Benefit.
Turning now to Fig. 2A, it is illustrated that large data sets, the wherein large data sets include array data structure.In the array of diagram
Each entry there is special data layout, it includes data value and its association index to large data sets, such as show in fig. 2b
Go out.Data value storage indexes the least significant bit for being stored in Data Entry in the highest significant position of Data Entry
In.
Software module can be laid out by performing such as following code to construct and assemble the special data of large data sets, its
Middle N=16:
The special data layout of Data Entry can by by data value and its index be combined in individual data entry and
Construction.Software module can be instructed to construct and assemble big number by performing two SIMD 16 to every 16 input data units
It is laid out according to the special data of collection.For example, for preceding 16 data cells, software module can perform;With , and can perform for the second data cell, software module ;With。
Same operation is performed to 16 data channel the parallel instructions of SIMD 16.SIMD instruction processing than wherein with
The method of each passage of consecutive way processing is more efficient.Although having been described above SIMD16 instructions, any SIMD instruction can be used.
In a further exemplary embodiment, construction special data is laid out offline and database 150 is pre-filled with greatly
Data set.
Fig. 3 shows to be determined with a pair association index for the border value data unit to the large data sets that determine and concurrently determined
Border value data unit in large data sets(For example, minimum or maximum data value)Method.This method generally may include
The border value data unit that the relatively small data unit in large data sets is continuously determined during multiple processing stages is single until producing
Data cell.
This method can be embodied as logical order collection in executable software, and it is stored in such as random access memory
(RAM), read-only storage(ROM), programming ROM(PROM), firmware, the machine of the memory such as flash memory or computer can
Read in storage medium, in such as programmable logic array(PLA), field programmable gate array(FPGA), complex programmable logic
Device(CPLD)Deng in configurable logic, in the fixing function hardware using assembler language programming and circuit engineering, such as
Application specific integrated circuit(ASIC), complementary metal oxide semiconductor(CMOS)Or transistor-transistor logic(TTL)Technology or its
Any combinations.
In process frame 310, processing stage(For example, the first processing stage)Receive data set.Received by the first processing stage
Data set may include large data sets.The data set is divided into multiple compared with small data set in process frame 320.For example, in SIMD environment
In, including the large data sets of 32 array elements are divided into two subarrays, each include 16 array elements, wherein each array
Element includes the special data layout illustrated in fig. 2b.
In this example, large data sets are divided into unit to ensure SIMD instruction(For example, SIMD16 is instructed)Available for parallel place
Data cell as much as possible is managed to improve systematic function.Any SIMD can be used to configure.
In process frame 330, it is determined that compared with the border value data unit between the collection of small data set, while determine its association rope
Draw.For example, it may be determined that between first and second subarray for each data channel(That is, array element)Minimum data
Value.
Each subarray includes 16 array elements, and using for exampleInstruct, can determine parallel for example Deng SIMD16 With
16 data sets between minimum data value.
For the index value of each minimum data value determined between each data set in subarray 1 and subarray 2
It is included in the data obtained concentration.Because the index value of each data value is attached to data value, the special data cloth with each entry
Office is consistent, it is determined that during minimum value, also determines the index of the value.Index value is positioned in the least significant bit of minimum data value.
Minimum data value between each data set is stored as new data set, and it includes 16 array elements.Each
Array element includes the association index of the minimum data value and data value between corresponding collection.New data set is defeated in process frame 340
Go out.
In process frame 350, method determines whether data set includes single entry(That is, array element).In this example, n=
16.Because n is not equal to 1, method continues next classification processing stage.Next classification processing stage implementation procedure frame 310-
340 processing.
For example, the second processing stage receives data set, it includes 16 array elements.Data set is divided into two subarrays,
Each include eight array elements.Since large data sets are divided into the subarray for including eight data channel, SIMD8 instructions can be used for
Determine the boundary data values of new data set.
Using for example SIMD8 instruction, can be simultaneously
Row determines for example With Eight numbers
According to the minimum data value between collection.
Minimum data value between each data set is stored as new data set, and it includes eight array elements.Each battle array
Column element includes the association index of the minimum data value and data value between corresponding collect.New data set is defeated in process frame 340
Go out.In process frame 350, n=8.
3rd processing stage received data set, and it includes eight array elements.Data set is divided into two subarrays, each bag
Include four array elements.Since large data sets are divided into the subarray for including four data channel, SIMD4 instructions can be used for determining newly
The boundary data values of data set.
Using for example SIMD4 instruction, can be simultaneously
Row determines for example WithFour data sets between minimum data value.
Minimum data value between each data set is stored as new data set, and it includes four array elements.Each battle array
Column element includes the association index of the minimum data value and data value between corresponding collect.New data set is defeated in process frame 340
Go out.In process frame 350, n=4.
The fourth process stage receives data set, and it includes four array elements.Data set is divided into two subarrays, each bag
Include two array elements.Since large data sets are divided into the subarray for including two data channel, SIMD2 instructions can be used for determining newly
The boundary data values of data set.
Using for example SIMD2 instruction, can
It is parallel to determine for exampleWith's
Minimum data value between two datasets.
Minimum data value between each data set is stored as new data set, and it includes two array elements.Each battle array
Column element includes the association index of the minimum data value and data value between corresponding collect.New data set is defeated in process frame 340
Go out.In process frame 350, n=2.
5th processing stage received data set, and it includes two array elements.Data set is divided into two subarrays, each bag
Include an array element.Since large data sets are divided into the subarray for including a data channel, SIMD1 instructions can be used for determining newly
The boundary data values of data set.
Using for example SIMD1 instruction, can be parallel
It is determined that for exampleIndividual data collection between minimum data value.
When completing for five processing stages, the minimum data value between data set includes single array element.Therefore, in mistake
Journey frame 350, n=1 and single entry are output to process frame 360.The highest significant position of entry includes the data boundary of single entry
The least significant bit of value and entry includes the index of value.The boundary data values of single entry represent the border of whole large data sets
Data value.
Because the special data layout of Data Entry includes the data value and its association index being combined in single entry,
Method can be with determining that concurrently determine data boundary value cell for the data boundary value cell of whole large data sets associates rope
Draw value.Once determining boundary data values to whole large data sets, its index is stored as the least significant bit of entry.
In an exemplary embodiment, when all indexes are maintained in new data layout by the position of Shortcomings, number
Being divided into several small groups according to collection causes all data in identical group can be laid out by special data to represent.First, according to Fig. 3 meters
Calculate the boundary data values and index for each group.Then the index in the data obtained is substituted by group index to form new data
Collection.The new data set is handled according to Fig. 3 to obtain whole boundary data values and corresponding group index.This can be retrieved from group index
Boundary data values and index in group index to obtain global data.
Fig. 4 A are shown with special data layout to perform the SIMD instruction list of operation, in SIMD environment and really
Association index determined by fixed in border value data unit to large data sets concurrently determines the boundary value number in large data sets
According to unit(For example, minimum or maximum data value)Method, including to smaller in large data sets during multiple processing stages
Data cell determines border value data unit until producing individual data unit.Fig. 4 B are how to perform the corresponding instruction in Fig. 4 A
With the exemplary block diagram of operation.
Generally, initialization section 401a can provide initialization array dataIndexArray [N];MinArray [16] and
MaxArray [16], wherein N=16.List 402a illustrate for search two arrays between for each data channel most
Small and two SIMD16 of maximum data value are instructed and are correspondingly stored the result into minArray [0:15] and maxArray
[0:15] in.
List 403a is illustrated for determining large data sets(It has the battle array of the multiple more than 32 array elements and for 16
Row size)Minimum and maximum data value false code.For example, in figure 4b, illustrated with reference to 403b to including 64 array elements
The minimum and maximum operation that the smaller portions of the large data sets of element perform.Initially, array element [0 is determined:15] with [16:31] it
Between minimum and maximum data matrix train value.In the first iteration of 403a false code(That is i=2, N=64), as a result with array element
[32;47] compare to determine array element [0:47] the minimum and maximum data matrix train value between.
Secondary iteration of the result of first iteration in false code(That is i=3)With array element [48:63] compare to determine battle array
Column element [0:63] the minimum and maximum data matrix train value between.The data array of gained is minArray [0:15] and
maxArray[0:15].Each array includes 16 array elements.
List 404a is illustrated for being divided into two subarrays in large data sets(Each include eight data array elements)When
Determine the SIMD instruction of the minimum and maximum data matrix train value in the large data sets(That is, SIMD8 is instructed).Should with reference to 404b diagrams
Configuration.
List 405a is illustrated for being divided into two subarrays in large data sets(Each include four data array elements)When
Determine the SIMD instruction of the minimum and maximum data matrix train value in the large data sets(That is, SIMD4 is instructed).Should with reference to 405b diagrams
Configuration.
List 406a is illustrated for being divided into two subarrays in large data sets(Each include two data array elements)When
Determine the SIMD instruction of the minimum and maximum data matrix train value in the large data sets(That is, SIMD2 is instructed).Should with reference to 406b diagrams
Configuration.
List 407a is illustrated for being divided into two subarrays in large data sets(Each include a data array element)When
Determine the SIMD instruction of the minimum and maximum data matrix train value in the large data sets(That is, SIMD1 is instructed).Should with reference to 407b diagrams
Configuration.
List 408a and with reference to 408b diagram gained wall scroll mesh array element, it include for large data sets entirely most
Small or maximum data value and it to the association index in large data sets.
Fig. 5 illustrates the embodiment of system 700.In embodiment, system 700 can be media system, but system 700 is unlimited
In the context.For example, system 700 may be incorporated into personal computer(PC), it is laptop computer, super laptop computer, flat
Plate computer, touch pad, portable computer, handheld computer, palmtop computer, personal digital assistant(PDA), honeycomb electricity
Words, combination cellular phone/PDA, TV, intelligent apparatus(For example, smart phone, Intelligent flat computer or intelligent television), it is mobile
Internet apparatus(MID), messaging device, in data communication equipment etc..
In embodiment, system 700 includes being coupled in the platform 702 of display 720.Platform 702 can be received from for example
Content services device 730 or the content of the grade content device of content delivery apparatus 740 or other similar content sources.Including one or
The navigation controller 750 of multiple navigation characteristics can be used for interacting with such as platform 702 and/or display 720.In these parts
Each it is described in more detail below.
In embodiment, platform 702 may include chipset 705, processor 710, memory 712, storage 714, figure
System 715, using 716 and/or any combinations of wireless device 718.Chipset 705 can processor 710, memory 712,
Store 714, graphics subsystem 715, communicated using offer between 716 and/or wireless device 718.For example, chipset
705 may include storage adapter(Do not describe), it can provide and store 714 and communicate.
Processor 710 can be realized as CISC(CISC)Or reduced instruction set computer(RISC)Processing
Device, x86 instruction set compatible processor, multinuclear or any other microprocessor or CPU(CPU).In embodiment,
Processor 710 may include dual core processor, double-core move processor, etc..
Memory 712 can be realized as volatile memory devices, such as, but not limited to random access memory(RAM), it is dynamic
State random access memory(DRAM)Or static RAM(SRAM).
Storage 714 can be realized as Nonvolatile memory devices, such as, but not limited to disc driver, CD drive, deposit
Store up tape drive(tap driver), internal storage device, attached storage devices, flash memory, battery back SDRAM(Together
Walk DRAM)And/or network accessible storage device.In embodiment, for example, storage 714 may include to be used for including for example more
Being improved during individual hard drives strengthens the storage performance of valuable Digital Media the technology of protection.
Graphics subsystem 715 can perform the processing of the image such as still life or video for display.For example, figure subsystem
System 715 can be graphics processing unit(GPU)Or vision processor(VPU).Analog or digital interface can be used for being communicatively coupled
Graphics subsystem 715 and display 720.For example, interface can be HDMI, display port, radio HDMI
And/or any one in wireless HD compatible techniques.Graphics subsystem 715 can be integrated into processor 710 or chipset 705.Figure
Shape subsystem 715 can be the stand-alone card for being communicationally coupled to chipset 705.
Figure described herein and/or video processing technique can be realized in various hardware structures.For example, figure and/or
Video functionality can be integrated in chipset.Alternatively, discrete figure and/or video processor can be used.As yet another
Embodiment, figure and/or video capability can be by general processors(It includes polycaryon processor)Realize.In further embodiment
In, function can be realized in consumer electronics device.
Wireless device 718 may include that one with reception signal can be transmitted using various suitable wireless communication technologys
Individual or multiple wireless devices.Such technology can involve the communication across one or more wireless networks.Exemplary wireless network
Including(But it is not limited to)WLAN(WLAN), wireless personal domain network(WPAN), wireless MAN(WMAN), Cellular Networks
Network and satellite network.In the communication across such network, wireless device 718 can be according to the one or more using any version
Can applied code operate.
In embodiment, display 720 may include the monitor or display of any television genre.Display 720 can wrap
Include such as computer display, touch screen displays, video-frequency monitor, device and/or TV as TV.Display 720 can be with
It is numeral and/or simulation.In embodiment, display 720 can be holographic display device.Moreover, display 720 can be can
Receive the transparent surface of visual projection.Such projection can pass on various forms of information, image and/or object.For example, so
Projection can be used for mobile augmented reality(MAR)The vision covering of application.In the control of one or more software applications 716
Under, platform 702 can show user interface 722 on display 720.
In embodiment, for example, content services device 730 can be hosted by any country, international and/or stand-alone service simultaneously
And so as to be accessed via internet by platform 702.Content services device 730 may couple to platform 702 and/or display
720.Platform 702 and/or content services device 730 may couple to network 760 to be transmitted to network 760 and from network 760(Example
Such as, send and/or receive)Media information.Content delivery apparatus 740 can also be coupled to platform 702 and/or display 720.
In embodiment, content services device 730 may include cable TV box, personal computer, network, phone, support
The device of internet or can deliver digital information and/or content equipment and can be via network 760 or directly in content
Unidirectional or bi-directionally transmitted content any other like device between provider and platform 702 and/or display 720.Will consciousness
Can be via any one part of network 760 into system 700 and content supplier and unidirectional and/or double therefrom to content
To transmission.The example of content may include any media information, and it includes such as video, music, medical treatment and game information, etc..
Content services device 730 receives such as cable TV programming(It includes media information, digital information)Etc. content,
And/or other guide.The example of content supplier may include that any cable or satellite television or radio or internet content carry
For business.The example of offer is not intended to limit embodiments of the invention.
In embodiment, platform 702 can receive control letter from the navigation controller 750 with one or more navigation characteristics
Number.The navigation characteristic of controller 750 can be used for interacting with such as user interface 722.In embodiment, navigation controller 750 can
To be fixed-point apparatus, it can allow user by space(For example, continuous and multidimensional)Computer in data input computer
Hardware component(Specifically, human-computer interface device).Such as graphic user interface(GUI)And many systems such as TV and monitor
User is allowed to control and provide data to computer or TV using physical gesture.
The movement of the navigation characteristic of controller 750 can be shown by mobile pointer, cursor, focusing ring or over the display
Other visual indicators and in display(For example, display 720)Upper response.For example, under the control of software application 716, it is fixed
Navigation characteristic of the position on navigation controller 750 maps to the virtual navigation feature for example shown in user interface 722.
In embodiment, controller 750 is not individual components and is integrated into platform 702 and/or display 720.However, embodiment is not
It is limited to the element of shown and described herein or is restricted in the context of shown and described herein.
In embodiment, for example, driver(It is not shown)It may include to allow users to after initial start when being activated
Platform 702 is opened and closed at once using button is touched(As TV)Technology.In " closing " platform, programmed logic can
Allow platform 702 by content streaming to media filter or other guide service unit 730 or content delivery apparatus 740.In addition,
Chipset 705 may include the hardware and/or software branch for example to 5.1 surround sound audios and/or the surround sound audio of fine definition 7.1
Hold.Driver may include the graphdriver for integrated graphics platform.In embodiment, graphdriver may include outer part
Part interconnects(PCI)Express graphics cards.
In various embodiments, any one or more in the part shown in system 700 can integrate.It is for example, flat
Platform 702 and content services device 730 can integrate, or such as platform 702 and content delivery apparatus 740 can integrate, or platform 702,
Content services device 730 and content delivery apparatus 740 can integrate.In various embodiments, platform 702 and display 720 can be with
It is integrated unit.For example, display 720 and content services device 730 can integrate, or display 720 and content delivery apparatus 740
It can integrate.These examples are not intended to the limitation present invention.
In various embodiments, system 700 can be realized as the combination of wireless system, wired system or both.It is being embodied as
During wireless system, system 700 may include to be suitable for passing through wireless shared medium(Such as one or more antennas, conveyer, connect
Device, transceiver, amplifier, wave filter, control logic are received, etc.)And the part and interface to communicate.The example of wireless shared medium can
Include the part of the wireless frequency spectrum such as RF spectrum.When implemented as a wired system, system 700 may include to be suitable for by having
Line communication media and the part and interface to communicate, such as input/output(I/O)Adapter, for connect I/O adapters with it is corresponding
Wired communication media physical connector, NIC(NIC), disk controller, Video Controller, Audio Controller etc..
The example of wired communication media may include line, cable, metal lead wire, printed circuit board (PCB)(PCB), bottom plate, switching fabric, semiconductor
Material, twisted-pair feeder, coaxial cable, optical fiber etc..
Platform 702 can establish one or more logics or physical channel to transmit information.The information may include media information
And control information.Media information may refer to any data of the indicator to the content of user.The example of content may include for example from
The data of voice conversation, video conference, streamcast video, Email(“email”)Message, voice mail message, alphanumeric
Symbol, figure, image, video, text etc..Data from voice conversation can be utterance information, quiet period, ambient noise,
Comfort noise, tone etc..Control information may refer to any data of order of the indicator to automated system, instruction or control word.
For example, control information can be used for passing through route media information of system, or instruction node handles media information in a predetermined manner.So
And embodiment be not limited to figure 5 illustrates or description element figure 5 illustrates or description context in it is unrestricted.
Described above, system 700 can change physical styles or form factor embody.Fig. 6 diagrams wherein may be used
The embodiment of small form factor device 800 comprising system 700.In embodiment, for example, device 800 can be realized as with nothing
The mobile computing device of line ability.Mobile computing device can refer to processing system and moving electric power source or supply(Such as one
Or multiple batteries)Any device.
Described above, the example of mobile computing device may include personal computer(PC), it is laptop computer, super
Laptop computer, tablet personal computer, touch pad, portable computer, handheld computer, palmtop computer, individual digital help
Reason(PDA), cell phone, combination cellular phone/PDA, TV, intelligent apparatus(For example, smart phone, Intelligent flat computer or
Intelligent television), mobile Internet device(MID), messaging device, data communication equipment, etc..
The example of mobile computing device may also include the computer for being arranged to be worn by people, such as wrist computer, finger
Computer, ring computer, eyeglass computer, belt clamp computer, arm band computer, footwear computer, clothing computers and its
His wearable computer.In embodiment, for example, mobile computing device can be realized to be able to carry out computer application and voice
Communication and/or the smart phone of data communication.Although the available mobile meter by example implementation for smart phone of some embodiments
Device is calculated to describe, it is to be realized that other embodiment can also be used other wireless mobile computing devices to realize.Embodiment is at this
It is unrestricted in context.
As shown in fig. 6, device 800 may include shell 802, display 804, input/output device 806 and antenna
808.Device 800 may also include navigation characteristic 812.Display 804 may include any suitable display unit, suitable for showing
In the information of mobile computing device.I/O devices 806 may include any suitable I/O devices, for entering information into mobile computing
In device.For I/O devices 806 example may include alphanumeric keyboard, numeric keypad, touch pad, enter key, button,
Switch, rocker switch, microphone, loudspeaker, speech recognition equipment and software, etc..Information can also pass through microphone device input
In 800.Such information can be digitized by speech recognition equipment.Embodiment is unrestricted in this context.
The combination of hardware element, software element or both can be used to realize in various embodiments.The example of hardware element can
Including processor, microprocessor, circuit, circuit element(For example, transistor, resistor, capacitor, inductor, etc.), integrated electricity
Road, application specific integrated circuit(ASIC), programmable logic device(PLD), digital signal processor(DSP), field-programmable gate array
Row(FPGA), gate, register, semiconductor devices, chip, microchip, chipset etc..The example of software may include software portion
Part, program, application, computer program, application program, system program, machine program, operating system software, middleware, firmware,
Software module, routine, subroutine, function, method, code, software interface, application programming interfaces(API), instruction set, calculate generation
Code, computer code, code segment, computer code segments, word, value, symbol or its any combinations.Determine embodiment whether using hard
Part element and/or software element, which are realized, to be changed according to many factors, such as desired computation rate, power level, heat-resisting
Property, process cycle budget, input data rate, output data rate, memory resource, data bus speed and it is other design or
Performance constraints.
The one or more aspects of at least one embodiment can by store representative instruction on a machine-readable medium Lai
Realize, the machine readable media represents the various logic in processor, and it promotes machine construction logic when being read by a machine
Perform technique described herein.Such expression(Referred to as " IP kernel ")It is storable on tangible machine readable media and supplies
Load manufacture machine to various clients or manufacturing facility, it effectively forms logic or processor.
Extra example and annotation:
Example 1 can provide method, and it includes continuously determining during passing through the multiple processing stages performed in a hierarchical manner greatly
In data set compared with small data set border value data unit until produce individual data unit and with determination determined by boundary value
Association index in data cell to large data sets concurrently determines the border value data unit in large data sets, wherein per number
Include multiple Data Entries according to collection.
Example 2 may include the method for example 1, and it further comprises the association index of data value and the data value being combined to
In individual data unit, and the individual data unit is stored as to the Data Entry in large data sets.
Example 3 may include the method for example 1, wherein each processing stage is with determining identified be worth into large data sets
Association index concurrently determines the border value data unit between the collection compared with small data set.
Example 4 may include the method for example 3, wherein determining that the border value data unit between the collection compared with small data set includes
Use single-instruction multiple-data(SIMD)Operation is performed to the Data Entry in each data set parallel instructions.
Example 5 may include the method for example 4, wherein the output of each processing stage produces data set, it is used as next
The new data set of individual processing stage is inputted and received.
Example 6 may include the method for example 1, be inputted wherein the first processing stage received large data sets as data set.
Example 7 may include the method for example 1, and wherein large data sets are stored in database as structured array.
Example 8 may include the method for example 1, and wherein border value data unit is minimum value data cell and maximum value data
One in unit.
Example 9 may include system, and it includes determining module, for determination determined by border value data unit to big number
The border value data unit in large data sets is concurrently determined according to the association index in collection.
The system that example 10 may include example 9, it further comprises composite module, for by data value and the data value
Association index is combined in individual data unit, and the individual data unit is stored as to the Data Entry in large data sets.
The system that example 11 may include example 10, wherein determining module be used for determination determined by border value data unit
Association index in big data concurrently determines the border value data unit in large data sets, and this includes performing in a hierarchical manner
Multiple processing stages during continuously determine in large data sets compared with small data set border value data unit until producing single number
According to unit.
The system that example 12 may include example 11, data set and the data set is divided into wherein each processing stage receives
It is multiple compared with small data set, wherein each data set includes multiple Data Entries.
The system that example 13 may include example 12, wherein each processing stage is with determining identified be worth into large data sets
Association index concurrently determine border value data unit between the collection compared with small data set.
Example 14 may include the system of example 13, wherein determining the border value data unit bag between the collection compared with small data set
Include and use single-instruction multiple-data(SIMD)Operation is performed to the Data Entry in each data set parallel instructions.
The system that example 15 may include example 14, wherein the output of each processing stage produces data set, it is used as down
The new data set of one processing stage is inputted and received.
The system that example 16 may include example 9, wherein border value data unit are minimum value data cell and maximum number
According to one in unit.
Example 17, at least one computer-readable medium, it includes instruction, and the instruction promotes if being executed by processor
Computer be used for determination determined by association index in border value data unit to large data sets concurrently determine big data
The border value data unit of concentration.
Example 18 may include at least one computer-readable medium of example 17, and it further comprises instructing, and the instruction is such as
Fruit is executed by processor, and promotes computer to be used to the association index of data value and the data value being combined to individual data unit
It is interior, and individual data unit is stored as to the Data Entry in large data sets.
Example 19 may include at least one computer-readable medium of example 18, if wherein the instruction is executed by processor
Then promote computer be used for determination determined by association index in border value data unit to big data concurrently determine greatly
Border value data unit in data set, big data is continuously determined during this multiple processing stage for including performing in a hierarchical manner
The interior border value data unit compared with small data set of collection is until producing individual data unit.
Example 20 may include at least one computer-readable medium of example 19, wherein each processing stage receives data set
And the data set is divided into it is multiple compared with small data set, wherein each data set includes multiple Data Entries.
Example 21 may include at least one computer-readable medium of example 20, wherein each processing stage is with determining institute really
Association index in fixed value to large data sets concurrently determines the border value data unit between the collection compared with small data set.
Example 22 may include at least one computer-readable medium of example 21, wherein determining the relatively decimal compared with small data set
According to the border value data unit between the collection of collection including the use of single-instruction multiple-data(SIMD)Parallel instructions to each data set
In Data Entry perform operation.
Example 23 may include at least one computer-readable medium of example 22, wherein the output of each processing stage produces
Data set, its be used as next processing stage new data set input and is received and the first processing stage reception big data
Collection inputs as data set.
Example 24 may include at least one computer-readable medium of example 17, and wherein large data sets are as structured array
It is stored in database.
Example 25 may include at least one computer-readable medium system of example 17, and wherein border value data unit is most
One in small Value Data unit and maximum value data unit.
Example may also include equipment, and it includes the part for being used to perform the method for any one of example 1 to 18.
Embodiments of the invention can be suitably used for and all types of semiconductor integrated circuit(“IC”)Chip is used together.This
The example of a little IC chips includes but is not limited to processor, controller, chipset component, programmable logic array(PLA), memory
Chip, network chip and the like.In addition, in some in figure, signal conductor is represented with line.Some can difference indicate
More composition signal paths, with digital label come indicate it is many form signal paths and/or one or more ends with
Arrow indicates main information flow path direction.However, this should not be construed in a limiting manner.On the contrary, such additional detail can
It is used together with one or more one exemplary embodiments in order to which circuit is more easily understood.The signal wire of any expression, nothing
By whether there is extra information, it actually may include to advance in a plurality of directions and can use the signal of any suitable type
Scheme(Such as the numeral or artificial line realized with differential pair, optical fiber cable and/or single ended line)One or more signals of realization.
Example sizes/models/values/ranges, but embodiments of the invention not limited to this can have been given.With manufacturing technology
(Such as photoetching)Formed with the time ripe, it is contemplated that the equipment with reduced size can be manufactured.In addition, for the purpose of simplifying the description and discuss,
It is and well-known to the electric power of IC chip and miscellaneous part/connect in order to not obscure some aspects of embodiments of the invention
Ground connection can or can not be shown in figure.In addition, setting can use block diagram format to show to avoid covering the reality of the present invention
Example is applied, and is also based on the fact that, the details of the realization set on such block diagram, which depends highly on, realizes embodiment institute
In platform(That is, such details should be in the visual field of technical staff completely in the art).In sets forth specific details(For example,
Circuit), can be in these no details or specific with these in the case of describing the example embodiment of the present invention
Embodiments of the invention are put into practice in the case of the version of details, this should be obvious to those skilled in that art.From
And describe to be considered as illustrative and not restrictive.
Some embodiments for example can be used can store instruction, instruction set(It can promote machine to hold if being performed by machine
Method and/or operation of the row according to embodiment)Machine or tangible computer-readable medium or article and realize.Such machine
Device may include, for example, any suitable processing platform, calculating platform, computing device, processing unit, computing system, processing system,
Computer, processor or the like, and any suitable combination of hardware and/or software can be used and realize.It is machine readable
Medium or article may include, such as the memory cell of any suitable type, storage arrangement, memory article, memory are situated between
Matter, storage device, storage article, storage medium and/or memory cell, for example, memory, removable or irremovable medium, can
Erasing or non-erasable medium, writeable or rewritable media, numeral or simulation medium, hard disk, floppy disk, compact disk, read-only storage
Device(CD-ROM), compact disc recordable(CD-R), solid state drive(CD-RW), it is CD, magnetizing mediums, magnet-optical medium, removable
Memory card or disk, various types of digital versatile discs(DVD), band, cassette tape or the like.Instruction may include any suitable
The code of type, such as source code, compiled code, interpretive code, executable code, quiet code, dynamic code, encrypted code are closed, with
And the like, its reality using any suitable advanced, rudimentary, object-oriented, vision, compiling and/or explanatory programming language
It is existing.
Unless other specific provision, it is to be realized that for example " processing ", " calculating ", " computing ", " it is determined that " or similar grade
Term refers to the action and/or processing of computer or computing system or similar electronic computing device, and it manipulates posting in computing system
Physical quantity is expressed as in storage and/or memory(For example, electronics)Data and/or transform it into other data, it is such
Other data are similarly expressed as the memory, register or other such information storage, transmission or display dresses of computing system
Put interior physical quantity.Embodiment is unrestricted in this context.
Term " coupling " can be used to refer to any kind of relation between the part talked about herein(Directly or indirectly
's), and it is applicable to electricity, machinery, fluid, optical, electromagnetic, electromechanics or other connections.In addition, term " first ", " second " etc. exist
Discussion, and the meaning without special time or time sequencing are used merely to facilitate herein, unless otherwise noted.
Those skilled in that art can be with by the extensive technology for appreciating from the foregoing description that embodiments of the invention
Diversified forms are realized.Therefore, although embodiments of the invention together with its particular example describe, embodiments of the invention it is true
Positive scope should not be so restricted, because other modifications will be to technology people when research figure, specification and following claim
Member becomes obvious.
Claims (26)
1. a kind of method, including:
By being continuously determined during multiple processing stages for performing in a hierarchical manner in large data sets compared with the boundary value of small data set
Data cell until produce individual data unit and with determination determined by border value data unit in the large data sets
Association index concurrently determines the border value data unit in the large data sets, wherein each data set includes multiple data strips
Mesh.
2. the method as described in claim 1, it further comprises:
The association index of data value and the data value is combined in individual data unit, and by the individual data unit
The Data Entry being stored as in the large data sets.
3. the method as described in claim 1, wherein each processing stage is with determining that identified value is arrived in the large data sets
Association index concurrently determine border value data unit between the collection compared with small data set.
4. method as claimed in claim 3, wherein determining that the border value data unit between the collection compared with small data set includes making
Use single-instruction multiple-data(SIMD)Operation is performed to the Data Entry in each data set parallel instructions.
5. method as claimed in claim 4, wherein the output of each processing stage produces and is used as next processing stage
The data set that new data set is inputted and received.
6. the method as described in claim 1, inputted wherein the first processing stage received the large data sets as data set.
7. the method as described in claim 1, wherein the large data sets are stored in database as structured array.
8. such as the method any one of claim 1-7, wherein the border value data unit is minimum value data cell
With one in maximum value data unit.
9. a kind of system, including:
Determining module, for determination determined by association index in border value data unit to large data sets concurrently determine
Border value data unit in the large data sets.
10. system as claimed in claim 9, it further comprises:
Composite module, for the association index of data value and the data value to be combined in individual data unit, and by institute
State the Data Entry that individual data unit is stored as in the large data sets.
11. system as claimed in claim 10, wherein the determining module be used for determination determined by border value data list
Member concurrently determines border value data unit in the large data sets to the association index in the large data sets, it include with
Continuously determined during multiple processing stages that hierarchical approaches perform in the large data sets compared with the border value data list of small data set
Member is until producing individual data unit.
12. system as claimed in claim 11, wherein each processing stage is used to receiving data set and by the data set
Be divided into it is multiple compared with small data set, wherein each data set includes multiple Data Entries.
13. system as claimed in claim 12, wherein each processing stage be used for determination determined by value arrive the big number
Concurrently determined according to the association index in collection described compared with the border value data unit between the collection of small data set.
14. system as claimed in claim 13, wherein determining described compared with the border value data unit between the collection of small data set
Including the use of single-instruction multiple-data(SIMD)Operation is performed to the Data Entry in each data set parallel instructions.
15. system as claimed in claim 14, wherein the output of each processing stage, which produces, is used as next processing stage
New data set input and the data set that is received.
16. such as the system any one of claim 9-15, wherein the border value data unit is minimum value data sheet
One in member and maximum value data unit.
17. a kind of equipment, including:
For with determination determined by association index in border value data unit to large data sets concurrently determine the big number
According to the part of the border value data unit of concentration.
18. equipment as claimed in claim 17, it further comprises:
For the association index of data value and the data value to be combined in individual data unit and by the individual data
Unit is stored as the part of the Data Entry in the large data sets.
19. equipment as claimed in claim 18, its further comprise being used for determination determined by border value data unit arrive
Association index in the large data sets concurrently determines the part of the border value data unit in the large data sets, and it includes
Continuously determined during the multiple processing stages performed in a hierarchical manner in the large data sets compared with the border value data of small data set
Unit is until producing individual data unit.
20. equipment as claimed in claim 19, wherein each processing stage is used to receiving data set and by the data set
Be divided into it is multiple compared with small data set, wherein each data set includes multiple Data Entries.
21. equipment as claimed in claim 20, wherein each processing stage be used for determination determined by value arrive the big number
The border value data unit between the collection compared with small data set is concurrently determined according to the association index in collection.
22. equipment as claimed in claim 21, wherein determining the boundary value between the collection compared with the relatively small data set of small data set
Data cell is including the use of single-instruction multiple-data(SIMD)Operation is performed to the Data Entry in each data set parallel instructions.
23. equipment as claimed in claim 22, wherein the output of each processing stage, which produces, is used as next processing stage
New data set input and the data set that is received, and the first processing stage, to receive the large data sets defeated as data set
Enter.
24. equipment as claimed in claim 17, wherein the large data sets are stored in database as structured array.
25. such as the equipment any one of claim 17-24, wherein the border value data unit is minimum value data sheet
One in member and maximum value data unit.
26. at least one computer-readable medium including instructing, the instruction is when being executed by a processor so that the calculating
Machine perform claim requires the method any one of 1-8.
Applications Claiming Priority (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201361798288P | 2013-03-15 | 2013-03-15 | |
US61/798288 | 2013-03-15 | ||
US61/798,288 | 2013-03-15 | ||
US13/853,589 | 2013-03-29 | ||
US13/853589 | 2013-03-29 | ||
US13/853,589 US9152663B2 (en) | 2013-03-15 | 2013-03-29 | Fast approach to finding minimum and maximum values in a large data set using SIMD instruction set architecture |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104050230A CN104050230A (en) | 2014-09-17 |
CN104050230B true CN104050230B (en) | 2018-04-10 |
Family
ID=51503065
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410096786.1A Expired - Fee Related CN104050230B (en) | 2013-03-15 | 2014-03-17 | The fast method of minimum and maximum value in large data sets is searched using SIMD instruction collection framework |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104050230B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108920410A (en) * | 2018-06-22 | 2018-11-30 | 华北理工大学 | A kind of big data processing unit and method |
CN114840255B (en) * | 2022-07-04 | 2022-09-27 | 飞腾信息技术有限公司 | Method, apparatus and device readable storage medium for processing data |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6341296B1 (en) * | 1998-04-28 | 2002-01-22 | Pmc-Sierra, Inc. | Method and apparatus for efficient selection of a boundary value |
CN101676864A (en) * | 2008-09-16 | 2010-03-24 | 国际商业机器公司 | Method and device for acquiring Euclidean norm of vector in processing system |
-
2014
- 2014-03-17 CN CN201410096786.1A patent/CN104050230B/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6341296B1 (en) * | 1998-04-28 | 2002-01-22 | Pmc-Sierra, Inc. | Method and apparatus for efficient selection of a boundary value |
CN101676864A (en) * | 2008-09-16 | 2010-03-24 | 国际商业机器公司 | Method and device for acquiring Euclidean norm of vector in processing system |
Non-Patent Citations (1)
Title |
---|
Integer Minimum or Maximum Element Search Using Streaming SIMD Extensions;Intel;《Integer Minimum or Maximum Element Search Using》;19990127;第1页第1节,第2页-第3页第2.2节,第4页第3.1节,第4页第3.2节,第7页-第8页第6节 * |
Also Published As
Publication number | Publication date |
---|---|
CN104050230A (en) | 2014-09-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3496008A1 (en) | Method and apparatus for processing convolution operation in neural network | |
CN104025031B (en) | Reduce the quantity operated in application to the order that shared memory unit performs | |
US20120143361A1 (en) | Augmented reality system | |
CN106575379A (en) | Improved fixed point integer implementations for neural networks | |
CN105321142B (en) | Sampling, mistake manages and/or the context switching carried out via assembly line is calculated | |
CN104781845B (en) | Handle video content | |
US9811334B2 (en) | Block operation based acceleration | |
CN104737198B (en) | The result of visibility test is recorded in input geometric object granularity | |
CN105074772A (en) | Improved multi-sampling anti-aliasing compression by use of unreachable bit combinations | |
KR20110106903A (en) | Audio-visual search and browse interface (avsbi) | |
CN104050230B (en) | The fast method of minimum and maximum value in large data sets is searched using SIMD instruction collection framework | |
KR101597623B1 (en) | Fast approach to finding minimum and maximum values in a large data set using simd instruction set architecture | |
CN105103512A (en) | Distributed graphics processing | |
US10242038B2 (en) | Techniques for block-based indexing | |
CN104782112B (en) | Device, method, system and equipment for adjusting video camera array | |
CN104054049A (en) | Reducing number of read/write operations performed by CPU to duplicate source data to enable parallel processing on source data | |
US10380106B2 (en) | Efficient method and hardware implementation for nearest neighbor search | |
CN104952100B (en) | The streaming of delay coloring compresses antialiasing method | |
CN104036827B (en) | Fuse reparation based on position | |
EP2798459B1 (en) | Reducing the number of io requests to memory when executing a program that iteratively processes contiguous data | |
Noguera et al. | Interaction and visualization of 3D virtual environments on mobile devices | |
CN104025152B (en) | By using simplification of the look-up table of weighting to local contrast compensation | |
CN104011789A (en) | Reducing the number of scaling engines used in a display controller to display a plurality of images on a screen | |
WO2014153690A1 (en) | Simd algorithm for image dilation and erosion processing | |
TWI543107B (en) | System, apparatus and method for connected component labeling in graphics processors |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20180410 Termination date: 20210317 |