CN104050230B - The fast method of minimum and maximum value in large data sets is searched using SIMD instruction collection framework - Google Patents

The fast method of minimum and maximum value in large data sets is searched using SIMD instruction collection framework Download PDF

Info

Publication number
CN104050230B
CN104050230B CN201410096786.1A CN201410096786A CN104050230B CN 104050230 B CN104050230 B CN 104050230B CN 201410096786 A CN201410096786 A CN 201410096786A CN 104050230 B CN104050230 B CN 104050230B
Authority
CN
China
Prior art keywords
data
value
data set
unit
data unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201410096786.1A
Other languages
Chinese (zh)
Other versions
CN104050230A (en
Inventor
L-A.唐
S-H.许
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Intel Corp
Original Assignee
Intel Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US13/853,589 external-priority patent/US9152663B2/en
Application filed by Intel Corp filed Critical Intel Corp
Publication of CN104050230A publication Critical patent/CN104050230A/en
Application granted granted Critical
Publication of CN104050230B publication Critical patent/CN104050230B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/22Arrangements for sorting or merging computer data on continuous record carriers, e.g. tape, drum, disc

Abstract

This disclosure relates to the fast method of minimum and maximum value in large data sets is searched using SIMD instruction collection framework.Single-instruction multiple-data can be used in system and method(SIMD)Instruction set architecture and the special data of array entries layout and with determination determined by association index in border value data unit to large data sets concurrently determine border value data unit in the large data sets.In one example, the special data layout of the array entries includes data value being combined in single array entries to the association index of array with it.

Description

Using SIMD instruction collection framework search large data sets in minimum and maximum value it is quick Method
Technical field
Embodiment described herein the data processing related generally to for large data sets, and relate more particularly to use Single-instruction multiple-data(SIMD)Processor handles large data sets.
Background technology
Single-instruction multiple-data(SIMD)Processor is generally used without multiple in the parallel application of mass data is showed Miscellaneous controlling stream or excessive inter-processor communication.It may include lower-level vision and image procossing for the typical case of SIMD processor, Such as pattern-recognition, database search and statistical analysis.The common operation involved in image procossing be to look for it is minimum or Maximum or to its association index in big data array.Most SIMD processor, which provides, can quickly perform minimum and maximum The instruction of operation.However, if SIMD processor must keep the index of these values of tracking generation, the data parallel of SIMD instruction It can be destroyed.
Brief description of the drawings
The various advantages of embodiments of the invention are by by reading following explanation and the claim enclosed and passing through ginseng Examine following figure and obvious are become to those skilled in that art, wherein:
Fig. 1 is the block diagram according to the example of the computing system of embodiment;
Fig. 2A -2B are the large data sets and the figure of the example of the data layout of the large data sets according to embodiment;
Fig. 3 is to determine data boundary value cell and the flow chart of the example of the method for index associated according to embodiment;With And
Fig. 4 A-4B are the figures according to the SIMD instruction list of embodiment and the example of block diagram.
Fig. 5 is the block diagram according to the system of embodiment;And
Fig. 6 is the figure according to the device of embodiment.
Embodiment
Turning now to Fig. 1, computing system 100 is shown, it includes CPU(CPU)120th, system storage 130, Storage device 140(Including database 150), graphics processing unit(GPU)160 and graphic memory 170.The system 100 of diagram Can be a part for mobile platform, such as laptop computer, personal digital assistant(PDA), intelligent wireless phone, media play Device, imaging device, mobile Internet device(MID), Intelligent flat computer etc. or its any combinations.System 100 can also be example Such as personal computer(PC), server, a part for the fixed platform such as work station.
CPU 120 may include Memory Controller(It is not shown), it provides the access to system storage 130, the system Memory 130 may include random access memory, such as double data rate(DDR)Synchronous Dynamic Random Access Memory module. The module of system storage 130 may be incorporated into single memory module in upright arrangement(SIMM), dual inline memory module(DIMM), it is small-sized DIMM(SODIMM)Deng in.CPU 120 can also have one or more drivers and/processor core(It is not shown), wherein each Core can have instruction fetch unit, instruction decoder, one-level multiple functionally(L1)Cache, execution unit, etc..CPU can be wrapped Include one or more single-instruction multiple-datas(SIMD)Processor core.CPU 120 can also carry out operating system(OS), such as Microsoft Windows, Linux or Mac (Macintosh) OS.
Storage device 140 can be realized with a variety of parts or subsystem, including such as disc driver, CD-ROM driver, flash Memory or other devices for being capable of lasting storage information.As shown in Fig. 1, storage device 140 includes database 150, It stores large data sets.
The system 100 of diagram also includes graphics processing unit(GPU)160, it is coupled in graphic memory 170.Special purpose chart Shape memory 170 may include GDDR(D graphics DR)Or DDR SDRAM modules, or be suitable for support figure render it is any its His memory technology.GPU 160 and graphic memory 170 can be arranged on graphics/video card, and wherein GPU 160 can be via example Such as PCI Express Graphics(PEG, such as peripheral parts interconnected/PCI Express x16 figure 15W-ATX specifications 1.0th, PCI special interesting groups)Bus or AGP(For example, AGP V3.0 interface specifications, in September, 2002)Bus etc. is schemed Shape bus and communicated with CPU 120.Graphics card can be integrated on system board, to the chip of host CPU 120(die)It is interior, be configured to Separate cards on motherboard etc..
As figure application a part, it is illustrated that GPU 160 perform software module.Figure application can be it needs to be determined that big number Minimum or maximum according to concentration and its association index to big data array.In one example, software module includes generation Code, for concurrently determining minimum or maximum in large data sets with determination value to the association index in big data array.
Software module may also include code, for the association index of data value and the data value to be combined into individual data list It is used for the Data Entry being stored as in large data sets in first.Software module can use such as Object-Oriented Programming Language(Such as C++) Write etc. any programming language.
GPU 160 may also include one or more single-instruction multiple-datas(SIMD)Processor core, for improving and/or supporting Graphics performance.So as to, it is illustrated that method can involve high-level data parallel and processing complexity graphics environment in especially have Benefit.
Turning now to Fig. 2A, it is illustrated that large data sets, the wherein large data sets include array data structure.In the array of diagram Each entry there is special data layout, it includes data value and its association index to large data sets, such as show in fig. 2b Go out.Data value storage indexes the least significant bit for being stored in Data Entry in the highest significant position of Data Entry In.
Software module can be laid out by performing such as following code to construct and assemble the special data of large data sets, its Middle N=16:
The special data layout of Data Entry can by by data value and its index be combined in individual data entry and Construction.Software module can be instructed to construct and assemble big number by performing two SIMD 16 to every 16 input data units It is laid out according to the special data of collection.For example, for preceding 16 data cells, software module can perform;With , and can perform for the second data cell, software module ;With
Same operation is performed to 16 data channel the parallel instructions of SIMD 16.SIMD instruction processing than wherein with The method of each passage of consecutive way processing is more efficient.Although having been described above SIMD16 instructions, any SIMD instruction can be used.
In a further exemplary embodiment, construction special data is laid out offline and database 150 is pre-filled with greatly Data set.
Fig. 3 shows to be determined with a pair association index for the border value data unit to the large data sets that determine and concurrently determined Border value data unit in large data sets(For example, minimum or maximum data value)Method.This method generally may include The border value data unit that the relatively small data unit in large data sets is continuously determined during multiple processing stages is single until producing Data cell.
This method can be embodied as logical order collection in executable software, and it is stored in such as random access memory (RAM), read-only storage(ROM), programming ROM(PROM), firmware, the machine of the memory such as flash memory or computer can Read in storage medium, in such as programmable logic array(PLA), field programmable gate array(FPGA), complex programmable logic Device(CPLD)Deng in configurable logic, in the fixing function hardware using assembler language programming and circuit engineering, such as Application specific integrated circuit(ASIC), complementary metal oxide semiconductor(CMOS)Or transistor-transistor logic(TTL)Technology or its Any combinations.
In process frame 310, processing stage(For example, the first processing stage)Receive data set.Received by the first processing stage Data set may include large data sets.The data set is divided into multiple compared with small data set in process frame 320.For example, in SIMD environment In, including the large data sets of 32 array elements are divided into two subarrays, each include 16 array elements, wherein each array Element includes the special data layout illustrated in fig. 2b.
In this example, large data sets are divided into unit to ensure SIMD instruction(For example, SIMD16 is instructed)Available for parallel place Data cell as much as possible is managed to improve systematic function.Any SIMD can be used to configure.
In process frame 330, it is determined that compared with the border value data unit between the collection of small data set, while determine its association rope Draw.For example, it may be determined that between first and second subarray for each data channel(That is, array element)Minimum data Value.
Each subarray includes 16 array elements, and using for exampleInstruct, can determine parallel for example Deng SIMD16 With 16 data sets between minimum data value.
For the index value of each minimum data value determined between each data set in subarray 1 and subarray 2 It is included in the data obtained concentration.Because the index value of each data value is attached to data value, the special data cloth with each entry Office is consistent, it is determined that during minimum value, also determines the index of the value.Index value is positioned in the least significant bit of minimum data value.
Minimum data value between each data set is stored as new data set, and it includes 16 array elements.Each Array element includes the association index of the minimum data value and data value between corresponding collection.New data set is defeated in process frame 340 Go out.
In process frame 350, method determines whether data set includes single entry(That is, array element).In this example, n= 16.Because n is not equal to 1, method continues next classification processing stage.Next classification processing stage implementation procedure frame 310- 340 processing.
For example, the second processing stage receives data set, it includes 16 array elements.Data set is divided into two subarrays, Each include eight array elements.Since large data sets are divided into the subarray for including eight data channel, SIMD8 instructions can be used for Determine the boundary data values of new data set.
Using for example SIMD8 instruction, can be simultaneously Row determines for example With Eight numbers According to the minimum data value between collection.
Minimum data value between each data set is stored as new data set, and it includes eight array elements.Each battle array Column element includes the association index of the minimum data value and data value between corresponding collect.New data set is defeated in process frame 340 Go out.In process frame 350, n=8.
3rd processing stage received data set, and it includes eight array elements.Data set is divided into two subarrays, each bag Include four array elements.Since large data sets are divided into the subarray for including four data channel, SIMD4 instructions can be used for determining newly The boundary data values of data set.
Using for example SIMD4 instruction, can be simultaneously Row determines for example WithFour data sets between minimum data value.
Minimum data value between each data set is stored as new data set, and it includes four array elements.Each battle array Column element includes the association index of the minimum data value and data value between corresponding collect.New data set is defeated in process frame 340 Go out.In process frame 350, n=4.
The fourth process stage receives data set, and it includes four array elements.Data set is divided into two subarrays, each bag Include two array elements.Since large data sets are divided into the subarray for including two data channel, SIMD2 instructions can be used for determining newly The boundary data values of data set.
Using for example SIMD2 instruction, can It is parallel to determine for exampleWith's Minimum data value between two datasets.
Minimum data value between each data set is stored as new data set, and it includes two array elements.Each battle array Column element includes the association index of the minimum data value and data value between corresponding collect.New data set is defeated in process frame 340 Go out.In process frame 350, n=2.
5th processing stage received data set, and it includes two array elements.Data set is divided into two subarrays, each bag Include an array element.Since large data sets are divided into the subarray for including a data channel, SIMD1 instructions can be used for determining newly The boundary data values of data set.
Using for example SIMD1 instruction, can be parallel It is determined that for exampleIndividual data collection between minimum data value.
When completing for five processing stages, the minimum data value between data set includes single array element.Therefore, in mistake Journey frame 350, n=1 and single entry are output to process frame 360.The highest significant position of entry includes the data boundary of single entry The least significant bit of value and entry includes the index of value.The boundary data values of single entry represent the border of whole large data sets Data value.
Because the special data layout of Data Entry includes the data value and its association index being combined in single entry, Method can be with determining that concurrently determine data boundary value cell for the data boundary value cell of whole large data sets associates rope Draw value.Once determining boundary data values to whole large data sets, its index is stored as the least significant bit of entry.
In an exemplary embodiment, when all indexes are maintained in new data layout by the position of Shortcomings, number Being divided into several small groups according to collection causes all data in identical group can be laid out by special data to represent.First, according to Fig. 3 meters Calculate the boundary data values and index for each group.Then the index in the data obtained is substituted by group index to form new data Collection.The new data set is handled according to Fig. 3 to obtain whole boundary data values and corresponding group index.This can be retrieved from group index Boundary data values and index in group index to obtain global data.
Fig. 4 A are shown with special data layout to perform the SIMD instruction list of operation, in SIMD environment and really Association index determined by fixed in border value data unit to large data sets concurrently determines the boundary value number in large data sets According to unit(For example, minimum or maximum data value)Method, including to smaller in large data sets during multiple processing stages Data cell determines border value data unit until producing individual data unit.Fig. 4 B are how to perform the corresponding instruction in Fig. 4 A With the exemplary block diagram of operation.
Generally, initialization section 401a can provide initialization array dataIndexArray [N];MinArray [16] and MaxArray [16], wherein N=16.List 402a illustrate for search two arrays between for each data channel most Small and two SIMD16 of maximum data value are instructed and are correspondingly stored the result into minArray [0:15] and maxArray [0:15] in.
List 403a is illustrated for determining large data sets(It has the battle array of the multiple more than 32 array elements and for 16 Row size)Minimum and maximum data value false code.For example, in figure 4b, illustrated with reference to 403b to including 64 array elements The minimum and maximum operation that the smaller portions of the large data sets of element perform.Initially, array element [0 is determined:15] with [16:31] it Between minimum and maximum data matrix train value.In the first iteration of 403a false code(That is i=2, N=64), as a result with array element [32;47] compare to determine array element [0:47] the minimum and maximum data matrix train value between.
Secondary iteration of the result of first iteration in false code(That is i=3)With array element [48:63] compare to determine battle array Column element [0:63] the minimum and maximum data matrix train value between.The data array of gained is minArray [0:15] and maxArray[0:15].Each array includes 16 array elements.
List 404a is illustrated for being divided into two subarrays in large data sets(Each include eight data array elements)When Determine the SIMD instruction of the minimum and maximum data matrix train value in the large data sets(That is, SIMD8 is instructed).Should with reference to 404b diagrams Configuration.
List 405a is illustrated for being divided into two subarrays in large data sets(Each include four data array elements)When Determine the SIMD instruction of the minimum and maximum data matrix train value in the large data sets(That is, SIMD4 is instructed).Should with reference to 405b diagrams Configuration.
List 406a is illustrated for being divided into two subarrays in large data sets(Each include two data array elements)When Determine the SIMD instruction of the minimum and maximum data matrix train value in the large data sets(That is, SIMD2 is instructed).Should with reference to 406b diagrams Configuration.
List 407a is illustrated for being divided into two subarrays in large data sets(Each include a data array element)When Determine the SIMD instruction of the minimum and maximum data matrix train value in the large data sets(That is, SIMD1 is instructed).Should with reference to 407b diagrams Configuration.
List 408a and with reference to 408b diagram gained wall scroll mesh array element, it include for large data sets entirely most Small or maximum data value and it to the association index in large data sets.
Fig. 5 illustrates the embodiment of system 700.In embodiment, system 700 can be media system, but system 700 is unlimited In the context.For example, system 700 may be incorporated into personal computer(PC), it is laptop computer, super laptop computer, flat Plate computer, touch pad, portable computer, handheld computer, palmtop computer, personal digital assistant(PDA), honeycomb electricity Words, combination cellular phone/PDA, TV, intelligent apparatus(For example, smart phone, Intelligent flat computer or intelligent television), it is mobile Internet apparatus(MID), messaging device, in data communication equipment etc..
In embodiment, system 700 includes being coupled in the platform 702 of display 720.Platform 702 can be received from for example Content services device 730 or the content of the grade content device of content delivery apparatus 740 or other similar content sources.Including one or The navigation controller 750 of multiple navigation characteristics can be used for interacting with such as platform 702 and/or display 720.In these parts Each it is described in more detail below.
In embodiment, platform 702 may include chipset 705, processor 710, memory 712, storage 714, figure System 715, using 716 and/or any combinations of wireless device 718.Chipset 705 can processor 710, memory 712, Store 714, graphics subsystem 715, communicated using offer between 716 and/or wireless device 718.For example, chipset 705 may include storage adapter(Do not describe), it can provide and store 714 and communicate.
Processor 710 can be realized as CISC(CISC)Or reduced instruction set computer(RISC)Processing Device, x86 instruction set compatible processor, multinuclear or any other microprocessor or CPU(CPU).In embodiment, Processor 710 may include dual core processor, double-core move processor, etc..
Memory 712 can be realized as volatile memory devices, such as, but not limited to random access memory(RAM), it is dynamic State random access memory(DRAM)Or static RAM(SRAM).
Storage 714 can be realized as Nonvolatile memory devices, such as, but not limited to disc driver, CD drive, deposit Store up tape drive(tap driver), internal storage device, attached storage devices, flash memory, battery back SDRAM(Together Walk DRAM)And/or network accessible storage device.In embodiment, for example, storage 714 may include to be used for including for example more Being improved during individual hard drives strengthens the storage performance of valuable Digital Media the technology of protection.
Graphics subsystem 715 can perform the processing of the image such as still life or video for display.For example, figure subsystem System 715 can be graphics processing unit(GPU)Or vision processor(VPU).Analog or digital interface can be used for being communicatively coupled Graphics subsystem 715 and display 720.For example, interface can be HDMI, display port, radio HDMI And/or any one in wireless HD compatible techniques.Graphics subsystem 715 can be integrated into processor 710 or chipset 705.Figure Shape subsystem 715 can be the stand-alone card for being communicationally coupled to chipset 705.
Figure described herein and/or video processing technique can be realized in various hardware structures.For example, figure and/or Video functionality can be integrated in chipset.Alternatively, discrete figure and/or video processor can be used.As yet another Embodiment, figure and/or video capability can be by general processors(It includes polycaryon processor)Realize.In further embodiment In, function can be realized in consumer electronics device.
Wireless device 718 may include that one with reception signal can be transmitted using various suitable wireless communication technologys Individual or multiple wireless devices.Such technology can involve the communication across one or more wireless networks.Exemplary wireless network Including(But it is not limited to)WLAN(WLAN), wireless personal domain network(WPAN), wireless MAN(WMAN), Cellular Networks Network and satellite network.In the communication across such network, wireless device 718 can be according to the one or more using any version Can applied code operate.
In embodiment, display 720 may include the monitor or display of any television genre.Display 720 can wrap Include such as computer display, touch screen displays, video-frequency monitor, device and/or TV as TV.Display 720 can be with It is numeral and/or simulation.In embodiment, display 720 can be holographic display device.Moreover, display 720 can be can Receive the transparent surface of visual projection.Such projection can pass on various forms of information, image and/or object.For example, so Projection can be used for mobile augmented reality(MAR)The vision covering of application.In the control of one or more software applications 716 Under, platform 702 can show user interface 722 on display 720.
In embodiment, for example, content services device 730 can be hosted by any country, international and/or stand-alone service simultaneously And so as to be accessed via internet by platform 702.Content services device 730 may couple to platform 702 and/or display 720.Platform 702 and/or content services device 730 may couple to network 760 to be transmitted to network 760 and from network 760(Example Such as, send and/or receive)Media information.Content delivery apparatus 740 can also be coupled to platform 702 and/or display 720.
In embodiment, content services device 730 may include cable TV box, personal computer, network, phone, support The device of internet or can deliver digital information and/or content equipment and can be via network 760 or directly in content Unidirectional or bi-directionally transmitted content any other like device between provider and platform 702 and/or display 720.Will consciousness Can be via any one part of network 760 into system 700 and content supplier and unidirectional and/or double therefrom to content To transmission.The example of content may include any media information, and it includes such as video, music, medical treatment and game information, etc..
Content services device 730 receives such as cable TV programming(It includes media information, digital information)Etc. content, And/or other guide.The example of content supplier may include that any cable or satellite television or radio or internet content carry For business.The example of offer is not intended to limit embodiments of the invention.
In embodiment, platform 702 can receive control letter from the navigation controller 750 with one or more navigation characteristics Number.The navigation characteristic of controller 750 can be used for interacting with such as user interface 722.In embodiment, navigation controller 750 can To be fixed-point apparatus, it can allow user by space(For example, continuous and multidimensional)Computer in data input computer Hardware component(Specifically, human-computer interface device).Such as graphic user interface(GUI)And many systems such as TV and monitor User is allowed to control and provide data to computer or TV using physical gesture.
The movement of the navigation characteristic of controller 750 can be shown by mobile pointer, cursor, focusing ring or over the display Other visual indicators and in display(For example, display 720)Upper response.For example, under the control of software application 716, it is fixed Navigation characteristic of the position on navigation controller 750 maps to the virtual navigation feature for example shown in user interface 722. In embodiment, controller 750 is not individual components and is integrated into platform 702 and/or display 720.However, embodiment is not It is limited to the element of shown and described herein or is restricted in the context of shown and described herein.
In embodiment, for example, driver(It is not shown)It may include to allow users to after initial start when being activated Platform 702 is opened and closed at once using button is touched(As TV)Technology.In " closing " platform, programmed logic can Allow platform 702 by content streaming to media filter or other guide service unit 730 or content delivery apparatus 740.In addition, Chipset 705 may include the hardware and/or software branch for example to 5.1 surround sound audios and/or the surround sound audio of fine definition 7.1 Hold.Driver may include the graphdriver for integrated graphics platform.In embodiment, graphdriver may include outer part Part interconnects(PCI)Express graphics cards.
In various embodiments, any one or more in the part shown in system 700 can integrate.It is for example, flat Platform 702 and content services device 730 can integrate, or such as platform 702 and content delivery apparatus 740 can integrate, or platform 702, Content services device 730 and content delivery apparatus 740 can integrate.In various embodiments, platform 702 and display 720 can be with It is integrated unit.For example, display 720 and content services device 730 can integrate, or display 720 and content delivery apparatus 740 It can integrate.These examples are not intended to the limitation present invention.
In various embodiments, system 700 can be realized as the combination of wireless system, wired system or both.It is being embodied as During wireless system, system 700 may include to be suitable for passing through wireless shared medium(Such as one or more antennas, conveyer, connect Device, transceiver, amplifier, wave filter, control logic are received, etc.)And the part and interface to communicate.The example of wireless shared medium can Include the part of the wireless frequency spectrum such as RF spectrum.When implemented as a wired system, system 700 may include to be suitable for by having Line communication media and the part and interface to communicate, such as input/output(I/O)Adapter, for connect I/O adapters with it is corresponding Wired communication media physical connector, NIC(NIC), disk controller, Video Controller, Audio Controller etc.. The example of wired communication media may include line, cable, metal lead wire, printed circuit board (PCB)(PCB), bottom plate, switching fabric, semiconductor Material, twisted-pair feeder, coaxial cable, optical fiber etc..
Platform 702 can establish one or more logics or physical channel to transmit information.The information may include media information And control information.Media information may refer to any data of the indicator to the content of user.The example of content may include for example from The data of voice conversation, video conference, streamcast video, Email(“email”)Message, voice mail message, alphanumeric Symbol, figure, image, video, text etc..Data from voice conversation can be utterance information, quiet period, ambient noise, Comfort noise, tone etc..Control information may refer to any data of order of the indicator to automated system, instruction or control word. For example, control information can be used for passing through route media information of system, or instruction node handles media information in a predetermined manner.So And embodiment be not limited to figure 5 illustrates or description element figure 5 illustrates or description context in it is unrestricted.
Described above, system 700 can change physical styles or form factor embody.Fig. 6 diagrams wherein may be used The embodiment of small form factor device 800 comprising system 700.In embodiment, for example, device 800 can be realized as with nothing The mobile computing device of line ability.Mobile computing device can refer to processing system and moving electric power source or supply(Such as one Or multiple batteries)Any device.
Described above, the example of mobile computing device may include personal computer(PC), it is laptop computer, super Laptop computer, tablet personal computer, touch pad, portable computer, handheld computer, palmtop computer, individual digital help Reason(PDA), cell phone, combination cellular phone/PDA, TV, intelligent apparatus(For example, smart phone, Intelligent flat computer or Intelligent television), mobile Internet device(MID), messaging device, data communication equipment, etc..
The example of mobile computing device may also include the computer for being arranged to be worn by people, such as wrist computer, finger Computer, ring computer, eyeglass computer, belt clamp computer, arm band computer, footwear computer, clothing computers and its His wearable computer.In embodiment, for example, mobile computing device can be realized to be able to carry out computer application and voice Communication and/or the smart phone of data communication.Although the available mobile meter by example implementation for smart phone of some embodiments Device is calculated to describe, it is to be realized that other embodiment can also be used other wireless mobile computing devices to realize.Embodiment is at this It is unrestricted in context.
As shown in fig. 6, device 800 may include shell 802, display 804, input/output device 806 and antenna 808.Device 800 may also include navigation characteristic 812.Display 804 may include any suitable display unit, suitable for showing In the information of mobile computing device.I/O devices 806 may include any suitable I/O devices, for entering information into mobile computing In device.For I/O devices 806 example may include alphanumeric keyboard, numeric keypad, touch pad, enter key, button, Switch, rocker switch, microphone, loudspeaker, speech recognition equipment and software, etc..Information can also pass through microphone device input In 800.Such information can be digitized by speech recognition equipment.Embodiment is unrestricted in this context.
The combination of hardware element, software element or both can be used to realize in various embodiments.The example of hardware element can Including processor, microprocessor, circuit, circuit element(For example, transistor, resistor, capacitor, inductor, etc.), integrated electricity Road, application specific integrated circuit(ASIC), programmable logic device(PLD), digital signal processor(DSP), field-programmable gate array Row(FPGA), gate, register, semiconductor devices, chip, microchip, chipset etc..The example of software may include software portion Part, program, application, computer program, application program, system program, machine program, operating system software, middleware, firmware, Software module, routine, subroutine, function, method, code, software interface, application programming interfaces(API), instruction set, calculate generation Code, computer code, code segment, computer code segments, word, value, symbol or its any combinations.Determine embodiment whether using hard Part element and/or software element, which are realized, to be changed according to many factors, such as desired computation rate, power level, heat-resisting Property, process cycle budget, input data rate, output data rate, memory resource, data bus speed and it is other design or Performance constraints.
The one or more aspects of at least one embodiment can by store representative instruction on a machine-readable medium Lai Realize, the machine readable media represents the various logic in processor, and it promotes machine construction logic when being read by a machine Perform technique described herein.Such expression(Referred to as " IP kernel ")It is storable on tangible machine readable media and supplies Load manufacture machine to various clients or manufacturing facility, it effectively forms logic or processor.
Extra example and annotation
Example 1 can provide method, and it includes continuously determining during passing through the multiple processing stages performed in a hierarchical manner greatly In data set compared with small data set border value data unit until produce individual data unit and with determination determined by boundary value Association index in data cell to large data sets concurrently determines the border value data unit in large data sets, wherein per number Include multiple Data Entries according to collection.
Example 2 may include the method for example 1, and it further comprises the association index of data value and the data value being combined to In individual data unit, and the individual data unit is stored as to the Data Entry in large data sets.
Example 3 may include the method for example 1, wherein each processing stage is with determining identified be worth into large data sets Association index concurrently determines the border value data unit between the collection compared with small data set.
Example 4 may include the method for example 3, wherein determining that the border value data unit between the collection compared with small data set includes Use single-instruction multiple-data(SIMD)Operation is performed to the Data Entry in each data set parallel instructions.
Example 5 may include the method for example 4, wherein the output of each processing stage produces data set, it is used as next The new data set of individual processing stage is inputted and received.
Example 6 may include the method for example 1, be inputted wherein the first processing stage received large data sets as data set.
Example 7 may include the method for example 1, and wherein large data sets are stored in database as structured array.
Example 8 may include the method for example 1, and wherein border value data unit is minimum value data cell and maximum value data One in unit.
Example 9 may include system, and it includes determining module, for determination determined by border value data unit to big number The border value data unit in large data sets is concurrently determined according to the association index in collection.
The system that example 10 may include example 9, it further comprises composite module, for by data value and the data value Association index is combined in individual data unit, and the individual data unit is stored as to the Data Entry in large data sets.
The system that example 11 may include example 10, wherein determining module be used for determination determined by border value data unit Association index in big data concurrently determines the border value data unit in large data sets, and this includes performing in a hierarchical manner Multiple processing stages during continuously determine in large data sets compared with small data set border value data unit until producing single number According to unit.
The system that example 12 may include example 11, data set and the data set is divided into wherein each processing stage receives It is multiple compared with small data set, wherein each data set includes multiple Data Entries.
The system that example 13 may include example 12, wherein each processing stage is with determining identified be worth into large data sets Association index concurrently determine border value data unit between the collection compared with small data set.
Example 14 may include the system of example 13, wherein determining the border value data unit bag between the collection compared with small data set Include and use single-instruction multiple-data(SIMD)Operation is performed to the Data Entry in each data set parallel instructions.
The system that example 15 may include example 14, wherein the output of each processing stage produces data set, it is used as down The new data set of one processing stage is inputted and received.
The system that example 16 may include example 9, wherein border value data unit are minimum value data cell and maximum number According to one in unit.
Example 17, at least one computer-readable medium, it includes instruction, and the instruction promotes if being executed by processor Computer be used for determination determined by association index in border value data unit to large data sets concurrently determine big data The border value data unit of concentration.
Example 18 may include at least one computer-readable medium of example 17, and it further comprises instructing, and the instruction is such as Fruit is executed by processor, and promotes computer to be used to the association index of data value and the data value being combined to individual data unit It is interior, and individual data unit is stored as to the Data Entry in large data sets.
Example 19 may include at least one computer-readable medium of example 18, if wherein the instruction is executed by processor Then promote computer be used for determination determined by association index in border value data unit to big data concurrently determine greatly Border value data unit in data set, big data is continuously determined during this multiple processing stage for including performing in a hierarchical manner The interior border value data unit compared with small data set of collection is until producing individual data unit.
Example 20 may include at least one computer-readable medium of example 19, wherein each processing stage receives data set And the data set is divided into it is multiple compared with small data set, wherein each data set includes multiple Data Entries.
Example 21 may include at least one computer-readable medium of example 20, wherein each processing stage is with determining institute really Association index in fixed value to large data sets concurrently determines the border value data unit between the collection compared with small data set.
Example 22 may include at least one computer-readable medium of example 21, wherein determining the relatively decimal compared with small data set According to the border value data unit between the collection of collection including the use of single-instruction multiple-data(SIMD)Parallel instructions to each data set In Data Entry perform operation.
Example 23 may include at least one computer-readable medium of example 22, wherein the output of each processing stage produces Data set, its be used as next processing stage new data set input and is received and the first processing stage reception big data Collection inputs as data set.
Example 24 may include at least one computer-readable medium of example 17, and wherein large data sets are as structured array It is stored in database.
Example 25 may include at least one computer-readable medium system of example 17, and wherein border value data unit is most One in small Value Data unit and maximum value data unit.
Example may also include equipment, and it includes the part for being used to perform the method for any one of example 1 to 18.
Embodiments of the invention can be suitably used for and all types of semiconductor integrated circuit(“IC”)Chip is used together.This The example of a little IC chips includes but is not limited to processor, controller, chipset component, programmable logic array(PLA), memory Chip, network chip and the like.In addition, in some in figure, signal conductor is represented with line.Some can difference indicate More composition signal paths, with digital label come indicate it is many form signal paths and/or one or more ends with Arrow indicates main information flow path direction.However, this should not be construed in a limiting manner.On the contrary, such additional detail can It is used together with one or more one exemplary embodiments in order to which circuit is more easily understood.The signal wire of any expression, nothing By whether there is extra information, it actually may include to advance in a plurality of directions and can use the signal of any suitable type Scheme(Such as the numeral or artificial line realized with differential pair, optical fiber cable and/or single ended line)One or more signals of realization.
Example sizes/models/values/ranges, but embodiments of the invention not limited to this can have been given.With manufacturing technology (Such as photoetching)Formed with the time ripe, it is contemplated that the equipment with reduced size can be manufactured.In addition, for the purpose of simplifying the description and discuss, It is and well-known to the electric power of IC chip and miscellaneous part/connect in order to not obscure some aspects of embodiments of the invention Ground connection can or can not be shown in figure.In addition, setting can use block diagram format to show to avoid covering the reality of the present invention Example is applied, and is also based on the fact that, the details of the realization set on such block diagram, which depends highly on, realizes embodiment institute In platform(That is, such details should be in the visual field of technical staff completely in the art).In sets forth specific details(For example, Circuit), can be in these no details or specific with these in the case of describing the example embodiment of the present invention Embodiments of the invention are put into practice in the case of the version of details, this should be obvious to those skilled in that art.From And describe to be considered as illustrative and not restrictive.
Some embodiments for example can be used can store instruction, instruction set(It can promote machine to hold if being performed by machine Method and/or operation of the row according to embodiment)Machine or tangible computer-readable medium or article and realize.Such machine Device may include, for example, any suitable processing platform, calculating platform, computing device, processing unit, computing system, processing system, Computer, processor or the like, and any suitable combination of hardware and/or software can be used and realize.It is machine readable Medium or article may include, such as the memory cell of any suitable type, storage arrangement, memory article, memory are situated between Matter, storage device, storage article, storage medium and/or memory cell, for example, memory, removable or irremovable medium, can Erasing or non-erasable medium, writeable or rewritable media, numeral or simulation medium, hard disk, floppy disk, compact disk, read-only storage Device(CD-ROM), compact disc recordable(CD-R), solid state drive(CD-RW), it is CD, magnetizing mediums, magnet-optical medium, removable Memory card or disk, various types of digital versatile discs(DVD), band, cassette tape or the like.Instruction may include any suitable The code of type, such as source code, compiled code, interpretive code, executable code, quiet code, dynamic code, encrypted code are closed, with And the like, its reality using any suitable advanced, rudimentary, object-oriented, vision, compiling and/or explanatory programming language It is existing.
Unless other specific provision, it is to be realized that for example " processing ", " calculating ", " computing ", " it is determined that " or similar grade Term refers to the action and/or processing of computer or computing system or similar electronic computing device, and it manipulates posting in computing system Physical quantity is expressed as in storage and/or memory(For example, electronics)Data and/or transform it into other data, it is such Other data are similarly expressed as the memory, register or other such information storage, transmission or display dresses of computing system Put interior physical quantity.Embodiment is unrestricted in this context.
Term " coupling " can be used to refer to any kind of relation between the part talked about herein(Directly or indirectly 's), and it is applicable to electricity, machinery, fluid, optical, electromagnetic, electromechanics or other connections.In addition, term " first ", " second " etc. exist Discussion, and the meaning without special time or time sequencing are used merely to facilitate herein, unless otherwise noted.
Those skilled in that art can be with by the extensive technology for appreciating from the foregoing description that embodiments of the invention Diversified forms are realized.Therefore, although embodiments of the invention together with its particular example describe, embodiments of the invention it is true Positive scope should not be so restricted, because other modifications will be to technology people when research figure, specification and following claim Member becomes obvious.

Claims (26)

1. a kind of method, including:
By being continuously determined during multiple processing stages for performing in a hierarchical manner in large data sets compared with the boundary value of small data set Data cell until produce individual data unit and with determination determined by border value data unit in the large data sets Association index concurrently determines the border value data unit in the large data sets, wherein each data set includes multiple data strips Mesh.
2. the method as described in claim 1, it further comprises:
The association index of data value and the data value is combined in individual data unit, and by the individual data unit The Data Entry being stored as in the large data sets.
3. the method as described in claim 1, wherein each processing stage is with determining that identified value is arrived in the large data sets Association index concurrently determine border value data unit between the collection compared with small data set.
4. method as claimed in claim 3, wherein determining that the border value data unit between the collection compared with small data set includes making Use single-instruction multiple-data(SIMD)Operation is performed to the Data Entry in each data set parallel instructions.
5. method as claimed in claim 4, wherein the output of each processing stage produces and is used as next processing stage The data set that new data set is inputted and received.
6. the method as described in claim 1, inputted wherein the first processing stage received the large data sets as data set.
7. the method as described in claim 1, wherein the large data sets are stored in database as structured array.
8. such as the method any one of claim 1-7, wherein the border value data unit is minimum value data cell With one in maximum value data unit.
9. a kind of system, including:
Determining module, for determination determined by association index in border value data unit to large data sets concurrently determine Border value data unit in the large data sets.
10. system as claimed in claim 9, it further comprises:
Composite module, for the association index of data value and the data value to be combined in individual data unit, and by institute State the Data Entry that individual data unit is stored as in the large data sets.
11. system as claimed in claim 10, wherein the determining module be used for determination determined by border value data list Member concurrently determines border value data unit in the large data sets to the association index in the large data sets, it include with Continuously determined during multiple processing stages that hierarchical approaches perform in the large data sets compared with the border value data list of small data set Member is until producing individual data unit.
12. system as claimed in claim 11, wherein each processing stage is used to receiving data set and by the data set Be divided into it is multiple compared with small data set, wherein each data set includes multiple Data Entries.
13. system as claimed in claim 12, wherein each processing stage be used for determination determined by value arrive the big number Concurrently determined according to the association index in collection described compared with the border value data unit between the collection of small data set.
14. system as claimed in claim 13, wherein determining described compared with the border value data unit between the collection of small data set Including the use of single-instruction multiple-data(SIMD)Operation is performed to the Data Entry in each data set parallel instructions.
15. system as claimed in claim 14, wherein the output of each processing stage, which produces, is used as next processing stage New data set input and the data set that is received.
16. such as the system any one of claim 9-15, wherein the border value data unit is minimum value data sheet One in member and maximum value data unit.
17. a kind of equipment, including:
For with determination determined by association index in border value data unit to large data sets concurrently determine the big number According to the part of the border value data unit of concentration.
18. equipment as claimed in claim 17, it further comprises:
For the association index of data value and the data value to be combined in individual data unit and by the individual data Unit is stored as the part of the Data Entry in the large data sets.
19. equipment as claimed in claim 18, its further comprise being used for determination determined by border value data unit arrive Association index in the large data sets concurrently determines the part of the border value data unit in the large data sets, and it includes Continuously determined during the multiple processing stages performed in a hierarchical manner in the large data sets compared with the border value data of small data set Unit is until producing individual data unit.
20. equipment as claimed in claim 19, wherein each processing stage is used to receiving data set and by the data set Be divided into it is multiple compared with small data set, wherein each data set includes multiple Data Entries.
21. equipment as claimed in claim 20, wherein each processing stage be used for determination determined by value arrive the big number The border value data unit between the collection compared with small data set is concurrently determined according to the association index in collection.
22. equipment as claimed in claim 21, wherein determining the boundary value between the collection compared with the relatively small data set of small data set Data cell is including the use of single-instruction multiple-data(SIMD)Operation is performed to the Data Entry in each data set parallel instructions.
23. equipment as claimed in claim 22, wherein the output of each processing stage, which produces, is used as next processing stage New data set input and the data set that is received, and the first processing stage, to receive the large data sets defeated as data set Enter.
24. equipment as claimed in claim 17, wherein the large data sets are stored in database as structured array.
25. such as the equipment any one of claim 17-24, wherein the border value data unit is minimum value data sheet One in member and maximum value data unit.
26. at least one computer-readable medium including instructing, the instruction is when being executed by a processor so that the calculating Machine perform claim requires the method any one of 1-8.
CN201410096786.1A 2013-03-15 2014-03-17 The fast method of minimum and maximum value in large data sets is searched using SIMD instruction collection framework Expired - Fee Related CN104050230B (en)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US201361798288P 2013-03-15 2013-03-15
US61/798288 2013-03-15
US61/798,288 2013-03-15
US13/853,589 2013-03-29
US13/853589 2013-03-29
US13/853,589 US9152663B2 (en) 2013-03-15 2013-03-29 Fast approach to finding minimum and maximum values in a large data set using SIMD instruction set architecture

Publications (2)

Publication Number Publication Date
CN104050230A CN104050230A (en) 2014-09-17
CN104050230B true CN104050230B (en) 2018-04-10

Family

ID=51503065

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410096786.1A Expired - Fee Related CN104050230B (en) 2013-03-15 2014-03-17 The fast method of minimum and maximum value in large data sets is searched using SIMD instruction collection framework

Country Status (1)

Country Link
CN (1) CN104050230B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108920410A (en) * 2018-06-22 2018-11-30 华北理工大学 A kind of big data processing unit and method
CN114840255B (en) * 2022-07-04 2022-09-27 飞腾信息技术有限公司 Method, apparatus and device readable storage medium for processing data

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6341296B1 (en) * 1998-04-28 2002-01-22 Pmc-Sierra, Inc. Method and apparatus for efficient selection of a boundary value
CN101676864A (en) * 2008-09-16 2010-03-24 国际商业机器公司 Method and device for acquiring Euclidean norm of vector in processing system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6341296B1 (en) * 1998-04-28 2002-01-22 Pmc-Sierra, Inc. Method and apparatus for efficient selection of a boundary value
CN101676864A (en) * 2008-09-16 2010-03-24 国际商业机器公司 Method and device for acquiring Euclidean norm of vector in processing system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Integer Minimum or Maximum Element Search Using Streaming SIMD Extensions;Intel;《Integer Minimum or Maximum Element Search Using》;19990127;第1页第1节,第2页-第3页第2.2节,第4页第3.1节,第4页第3.2节,第7页-第8页第6节 *

Also Published As

Publication number Publication date
CN104050230A (en) 2014-09-17

Similar Documents

Publication Publication Date Title
EP3496008A1 (en) Method and apparatus for processing convolution operation in neural network
CN104025031B (en) Reduce the quantity operated in application to the order that shared memory unit performs
US20120143361A1 (en) Augmented reality system
CN106575379A (en) Improved fixed point integer implementations for neural networks
CN105321142B (en) Sampling, mistake manages and/or the context switching carried out via assembly line is calculated
CN104781845B (en) Handle video content
US9811334B2 (en) Block operation based acceleration
CN104737198B (en) The result of visibility test is recorded in input geometric object granularity
CN105074772A (en) Improved multi-sampling anti-aliasing compression by use of unreachable bit combinations
KR20110106903A (en) Audio-visual search and browse interface (avsbi)
CN104050230B (en) The fast method of minimum and maximum value in large data sets is searched using SIMD instruction collection framework
KR101597623B1 (en) Fast approach to finding minimum and maximum values in a large data set using simd instruction set architecture
CN105103512A (en) Distributed graphics processing
US10242038B2 (en) Techniques for block-based indexing
CN104782112B (en) Device, method, system and equipment for adjusting video camera array
CN104054049A (en) Reducing number of read/write operations performed by CPU to duplicate source data to enable parallel processing on source data
US10380106B2 (en) Efficient method and hardware implementation for nearest neighbor search
CN104952100B (en) The streaming of delay coloring compresses antialiasing method
CN104036827B (en) Fuse reparation based on position
EP2798459B1 (en) Reducing the number of io requests to memory when executing a program that iteratively processes contiguous data
Noguera et al. Interaction and visualization of 3D virtual environments on mobile devices
CN104025152B (en) By using simplification of the look-up table of weighting to local contrast compensation
CN104011789A (en) Reducing the number of scaling engines used in a display controller to display a plurality of images on a screen
WO2014153690A1 (en) Simd algorithm for image dilation and erosion processing
TWI543107B (en) System, apparatus and method for connected component labeling in graphics processors

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20180410

Termination date: 20210317