CN104050230A - Fast approach to finding minimum and maximum values in a large data set using SIMD instruction set architecture - Google Patents

Fast approach to finding minimum and maximum values in a large data set using SIMD instruction set architecture Download PDF

Info

Publication number
CN104050230A
CN104050230A CN201410096786.1A CN201410096786A CN104050230A CN 104050230 A CN104050230 A CN 104050230A CN 201410096786 A CN201410096786 A CN 201410096786A CN 104050230 A CN104050230 A CN 104050230A
Authority
CN
China
Prior art keywords
data
value
data set
data unit
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410096786.1A
Other languages
Chinese (zh)
Other versions
CN104050230B (en
Inventor
L-A.唐
S-H.许
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Intel Corp
Original Assignee
Intel Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US13/853,589 external-priority patent/US9152663B2/en
Application filed by Intel Corp filed Critical Intel Corp
Publication of CN104050230A publication Critical patent/CN104050230A/en
Application granted granted Critical
Publication of CN104050230B publication Critical patent/CN104050230B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/22Arrangements for sorting or merging computer data on continuous record carriers, e.g. tape, drum, disc

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Image Processing (AREA)

Abstract

The invention relates to a fast approach to finding minimum and maximum values in a large data set using an SIMD instruction set architecture. Systems and methods may determine a boundary value data unit in a large data set in parallel with determining an associated index of the determined boundary value data unit into the large data set using a single instruction multiple data (SIMD) instruction set architecture and a specialized data layout of array entries. In one example, the specialized data layout of array entries combines a data value and its associated index to an array into a single array entry.

Description

Use SIMD instruction set architecture to search minimum in large data sets and peaked fast method
Technical field
Embodiment described herein relates generally to the data processing for large data sets, and more specifically relates to and use single instruction multiple data (SIMD) processor to process large data sets.
Background technology
Single instruction multiple data (SIMD) processor uses substantially and does not have complex control to flow or excessive inter-processor communication in representing the parallel application of mass data.Typical case's application for SIMD processor can comprise lower-level vision and image processing, for example pattern-recognition, database search and statistical study.The common operation involving in image is processed is search minimum or maximal value or arrive its association index in large data array.Most SIMD processor provides can carry out instruction minimum and maximum operation fast.Yet if SIMD processor must keep following the tracks of the index that produces these values, the data parallel of SIMD instruction can be destroyed.
Accompanying drawing explanation
The various advantages of embodiments of the invention are by by reading following explanation and the claim of enclosing and by reference to following figure, those skilled in that art being become obviously, wherein:
Fig. 1 is according to the block diagram of the example of the computing system of embodiment;
Fig. 2 A-2B is according to the figure of the example of the data layout of the large data sets of embodiment and this large data sets;
Fig. 3 determines the process flow diagram of example of the method for data boundary value cell and associated index according to embodiment; And
Fig. 4 A-4B is according to the figure of the example of the SIMD instruction list of embodiment and block diagram.
Fig. 5 is according to the block diagram of the system of embodiment; And
Fig. 6 is according to the figure of the device of embodiment.
Embodiment
Turn to now Fig. 1, computing system 100 is shown, it comprises that CPU (central processing unit) (CPU) 120, system storage 130, memory storage 140(comprise database 150), Graphics Processing Unit (GPU) 160 and graphic memory 170.Illustrated system 100 can be a part for mobile platform, such as kneetop computer, PDA(Personal Digital Assistant), intelligent wireless phone, media player, imaging device, mobile Internet device (MID), Intelligent flat computer etc. or its any combination.System 100 can also be a part for stationary platforms such as personal computer (PC), server, workstation.
CPU 120 can comprise Memory Controller (not shown), and it provides the access to system storage 130, and this system storage 130 can comprise random access memory, for example double data rate (DDR) Synchronous Dynamic Random Access Memory module.The module of system storage 130 can be incorporated to single memory module in upright arrangement (SIMM), dual inline memory module (DIMM), small-sized DIMM(SODIMM) etc. in.CPU 120 also can have one or more drivers and/processor core (not shown), wherein each endorse there is fetch unit multiple functionally, instruction decoder, one-level (L1) high-speed cache, performance element, etc.CPU can comprise one or more single instruction multiple datas (SIMD) processor core.CPU 120 also can executive operating system (OS), for example Microsoft Windows, Linux or Mac (Macintosh) OS.
Memory storage 140 can be realized with multiple parts or subsystem, for example comprises disc driver, CD-ROM driver, flash memory or can store lastingly other devices of information.As illustrated in Fig. 1, memory storage 140 comprises database 150, and it stores large data sets.
Illustrated system 100 also comprises Graphics Processing Unit (GPU) 160, and it is coupled in graphic memory 170.Dedicated graphics memory 170 can comprise GDDR(D graphics DR) or DDR SDRAM module, or be suitable for supporting any other memory technology of graph rendering.GPU 160 and graphic memory 170 can be arranged on graphics/video card, wherein GPU 160 can be via for example PCI Express Graphics(PEG, for example peripheral parts interconnected/PCI Express x16 figure 15W-ATX standard 1.0, PCI special interesting group) bus or Accelerated Graphics Port are (for example, AGP V3.0 interface specification, in September, 2002) graphics bus such as bus and communicating by letter with CPU 120.Graphics card can be integrated on system board, in host CPU 120 wafers (die), be configured to separate cards on motherboard etc.
As a part for graphical application, illustrated GPU 160 executive software modules.Graphical application can be determined minimum in large data sets or maximal value and to its association index of large data array.In one example, software module comprises code, for determining concurrently minimum or the maximal value of large data sets with determined value to the association index in large data array.
Software module also can comprise code, for the association index of data value and this data value is combined in individual data unit for being stored as the data entry of large data sets.Software module can be with writing such as any programming languages such as Object-Oriented Programming Languages (such as C++).
GPU 160 also can comprise one or more single instruction multiple datas (SIMD) processor core, for improving and/or support graphics performance.Thereby, illustrated method can involve high-level data parallel and process the graphics environment of complicacy in useful especially.
Turn to now Fig. 2 A, diagram large data sets, wherein this large data sets comprises array data structure.Each entry in illustrated array has special data layout, its association index that it comprises data value and arrives large data sets, as shown in Figure 2 B.Data value storage is in data strip object highest significant position, and index is stored in data strip object least significant bit (LSB).
Software module can be by carrying out the special data layout that for example large data sets was constructed and assembled to following code, wherein N=16:
The special data layout of data strip object can be by being incorporated into the index-group of data value and it in individual data entry and constructing.Software module can be by carrying out two special data layouts that large data sets was constructed and assembled in SIMD 16 instructions to every 16 input data cells.For example, for front 16 data cells, software module can be carried out ; With , and for the second data cell, software module can be carried out ; With .
SIMD carries out same operation to 16 data channel in 16 parallel instructions ground.SIMD instruction process is comparable, and wherein in mode in succession, to process the method for each passage more efficient.Although described SIMD16 instruction, can use any SIMD instruction.
In another one exemplary embodiment, off-line constructs special data layout and database 150 has been pre-charged with large data sets.
Fig. 3 illustrates with the border value data unit to definite and to the association index of large data sets, determines the method for determining concurrently the border value data unit (for example, minimum or maximum data value) in large data sets.During the method can be included in a plurality of processing stage substantially, determine continuously in large data sets compared with the border value data unit of small data unit until produce individual data unit.
The method can be able to be embodied as logical order collection in executive software, it is stored in for example random-access memory (ram), ROM (read-only memory) (ROM), programming ROM (PROM), firmware, in the machine or computer-readable recording medium of the storeies such as flash memory, at for example programmable logic array (PLA), field programmable gate array (FPGA), in the configurable logics such as complex programmable logic device (CPLD), in using the fixed function hardware of assembly language programming and circuit engineering, special IC (ASIC) for example, complementary metal oxide semiconductor (CMOS) (CMOS) or transistor-transistor logic (TTL) technology or its any combination.
At process frame 310, processing stage (for example, first processing stage) receive data set.The data set receiving by first processing stage can comprise large data sets.This data set is divided into a plurality of compared with small data set at process frame 320.For example, in SIMD environment, comprise that the large data sets of 32 array elements is divided into two subarrays, each comprises 16 array elements, and wherein each array element is included in illustrated special data layout in Fig. 2 B.
In example, large data sets is divided into unit and guarantees that SIMD instruction (for example, SIMD16 instruction) can be used for parallel processing data cell as much as possible to improve system performance.Can use any SIMD configuration.
At process frame 330, determine compared with the border value data unit between the collection of small data set, determine its association index simultaneously.For example, can determine the minimum data value for each data channel (that is, array element) between first and second subarray.
Each subarray comprises 16 array elements, and for example uses deng SIMD16 instruction, can walk abreast and for example determine ? ? with 16 data sets between minimum data value.
For the index value of each minimum data value of determining between each data set in subarray 1 and subarray 2, being included in the data obtained concentrates.Because the index value of each data value appends to data value, consistent with the special data layout of each entry, when determining minimum value, also determine the index of this value.Index value is positioned in the least significant bit (LSB) of minimum data value.
Minimum data value between each data set is stored as new data set, and it comprises 16 array elements.Each array element comprises minimum data value between corresponding collection and the association index of data value.New data set is in 340 outputs of process frame.
At process frame 350, whether method specified data collection comprises single entry (that is, array element).In this example, n=16.Because n is not equal to 1, the processing stage that method continuing next classification.The processing of implementation frame 310-340 processing stage of next classification.
For example, receive data set processing stage of second, it comprises 16 array elements.Data set is divided into two subarrays, and each comprises eight array elements.Since large data sets is divided into the subarray that comprises eight data channel, SIMD8 instruction can be used for determining the data boundary value of new data set.
For example use sIMD8 instruction, can walk abreast and for example determine ? ? with eight data sets between minimum data value.
Minimum data value between each data set is stored as new data set, and it comprises eight array elements.Each array element comprises minimum data value between corresponding collection and the association index of data value.New data set is in 340 outputs of process frame.At process frame 350, n=8.
The processing stage of the 3rd, receive data set, it comprises eight array elements.Data set is divided into two subarrays, and each comprises four array elements.Since large data sets is divided into the subarray that comprises four data channel, SIMD4 instruction can be used for determining the data boundary value of new data set.
For example use sIMD4 instruction, can walk abreast and for example determine with four data sets between minimum data value.
Minimum data value between each data set is stored as new data set, and it comprises four array elements.Each array element comprises minimum data value between corresponding collection and the association index of data value.New data set is in 340 outputs of process frame.At process frame 350, n=4.
The processing stage of the 4th, receive data set, it comprises four array elements.Data set is divided into two subarrays, and each comprises two array elements.Since large data sets is divided into the subarray that comprises two data channel, SIMD2 instruction can be used for determining the data boundary value of new data set.
For example use sIMD2 instruction, can walk abreast and for example determine with two data sets between minimum data value.
Minimum data value between each data set is stored as new data set, and it comprises two array elements.Each array element comprises minimum data value between corresponding collection and the association index of data value.New data set is in 340 outputs of process frame.At process frame 350, n=2.
The processing stage of the 5th, receive data set, it comprises two array elements.Data set is divided into two subarrays, and each comprises an array element.Since large data sets is divided into the subarray that comprises a data channel, SIMD1 instruction can be used for determining the data boundary value of new data set.
For example use sIMD1 instruction, can walk abreast and for example determine individual data collection between minimum data value.
In the time of the processing stage completing the 5th, the minimum data value between data set comprises single array element.Therefore,, at process frame 350, n=1 and single entry output to process frame 360.The highest significant position of entry comprises that the data boundary value of single entry and the least significant bit (LSB) of entry comprise the index of value.The data boundary value of single entry represents the data boundary value of whole large data sets.
Because the special data layout of data strip object comprises the data value that is combined in single entry and its association index, method can be determined the association index value of data boundary value cell concurrently with determining the data boundary value cell for whole large data sets.Once whole large data sets be determined to data boundary value, the least significant bit (LSB) that its index stores is entry.
In an exemplary embodiment, when the position of Shortcomings is while remaining in new data layout by all index, data set can be divided into all data of several little groups in making mutually on the same group and can be represented by special data layout.First, according to Fig. 3, calculate data boundary value and the index for each group.Then the index in the data obtained is replaced to form new data set by group index.According to Fig. 3, process this new data set and obtain whole data boundary value and corresponding group index.Data boundary value and the index that from group index, can retrieve this group obtain global data index.
Fig. 4 A illustrates the SIMD instruction list that carrys out executable operations with special data layout, for example, for determining to the association index in large data sets that concurrently border value data unit in large data sets (with determining determined border value data unit at SIMD environment, minimum or maximum data value) method, be included in a plurality of processing stage during to determining border value data unit until produce individual data unit compared with small data unit in large data sets.Fig. 4 B is the how corresponding instruction in execution graph 4A and the exemplary block diagram of operation.
Substantially, initialization section 401a can provide initialization array dataIndexArray[N]; MinArray[16] and maxArray[16], N=16 wherein.List 402a diagram for search two SIMD16 instructions of the minimum for each data channel between two arrays and maximum data value and correspondingly by result store at minArray[0:15] and maxArray[0:15].
List 403a diagram is for determining the minimum of large data sets (its have be greater than 32 array elements and be the array size of 16 multiple) and the false code of maximum data value.For example, in Fig. 4 B, with reference to 403b diagram, to comprising minimum and maximum that the smaller portions of the large data sets of 64 array elements are carried out, operate.Initially, determine minimum and the maximum data array of values between array element [0:15] and [16:31].At first iteration (being i=2, N=64) of the false code of 403a, result and array element [32; 47] relatively carry out to determine minimum and the maximum data array of values between array element [0:47].
The result of the first iteration relatively carrys out to determine minimum and the maximum data array of values between array element [0:63] at secondary iteration (being i=3) and the array element [48:63] of false code.The data array of gained is minArray[0:15] and maxArray[0:15].Each array comprises 16 array elements.
List 404a diagram is determined the SIMD instruction of minimum and the maximum data array of values of this large data sets (, SIMD8 instruction) for be divided into two subarrays at large data sets when (each comprises eight data array elements).With reference to 404b, illustrate this configuration.
List 405a diagram is determined the SIMD instruction of minimum and the maximum data array of values of this large data sets (, SIMD4 instruction) for be divided into two subarrays at large data sets when (each comprises four data array elements).With reference to 405b, illustrate this configuration.
List 406a diagram is determined the SIMD instruction of minimum and the maximum data array of values of this large data sets (, SIMD2 instruction) for be divided into two subarrays at large data sets when (each comprises two data array elements).With reference to 406b, illustrate this configuration.
List 407a diagram is determined the SIMD instruction of minimum and the maximum data array of values of this large data sets (, SIMD1 instruction) for be divided into two subarrays at large data sets when (each comprises a data array element).With reference to 407b, illustrate this configuration.
List 408a and with reference to the wall scroll order array element of 408b diagram gained, it comprises for the whole minimum of large data sets or maximum data value and it to the association index in large data sets.
The embodiment of Fig. 5 system shown 700.In an embodiment, system 700 can be media system, but system 700 is not limited to this context.For example, system 700 for example can be incorporated to, in personal computer (PC), laptop computer, super laptop computer, panel computer, touch pad, portable computer, handheld computer, palmtop computer, PDA(Personal Digital Assistant), cell phone, combination cellular phone/PDA, TV, intelligent apparatus (, smart phone, Intelligent flat computer or intelligent television), mobile Internet device (MID), message transfer device, data communication equipment (DCE) etc.
In an embodiment, system 700 comprises the platform 702 that is coupled in display 720.Platform 702 can receive the content from content devices such as content services device 730 or content delivery apparatus 740 or other similar content source.The navigation controller 750 that comprises one or more navigation characteristic can be used for for example platform 702 and/or display 720 mutual.Each in these parts is below being described in more detail.
In an embodiment, platform 702 can comprise any combination of chipset 705, processor 710, storer 712, storage 714, graphics subsystem 715, application 716 and/or wireless device 718.Chipset 705 can provide mutual communication between processor 710, storer 712, storage 714, graphics subsystem 715, application 716 and/or wireless device 718.For example, chipset 705 can comprise storage adapter (not describing), and it can provide and mutual communication of storing 714.
Processor 710 can be embodied as complex instruction set computer (CISC) (CISC) or reduced instruction set computer (RISC) processor, x86 instruction set compatible processor, multinuclear or any other microprocessor or CPU (central processing unit) (CPU).In an embodiment, processor 710 can comprise that dual core processor, double-core move processor, etc.
Storer 712 can be embodied as volatile memory devices, such as but not limited to random-access memory (ram), dynamic RAM (DRAM) or static RAM (SRAM) (SRAM).
Storage 714 can be embodied as Nonvolatile memory devices, such as but not limited to disc driver, CD drive, file driver (tap driver), internal storage device, attached memory storage, flash memory, battery backup SDRAM(synchronous dram) and/or network accessible storage device.In an embodiment, for example, storage 714 can comprise for improve the technology that the memory property of valuable Digital Media is strengthened to protection when comprising a plurality of hard drives for example.
Graphics subsystem 715 can be carried out the processing of images such as still life or video for showing.For example, graphics subsystem 715 can be Graphics Processing Unit (GPU) or vision processor (VPU).Analog or digital interface can be used for being coupled communicatedly graphics subsystem 715 and display 720.For example, interface can be any in HDMI (High Definition Multimedia Interface), display port, radio HDMI and/or wireless HD compatible technique.Graphics subsystem 715 can be integrated in processor 710 or chipset 705.Graphics subsystem 715 can be the stand-alone card that is coupled in communicatedly chipset 705.
Figure described herein and/or video processing technique can realize in various hardware structures.For example, figure and/or video functionality can be integrated in chipset.Alternatively, can use discrete figure and/or video processor.As another embodiment again, figure and/or video capability can be realized by general processor (it comprises polycaryon processor).In a further embodiment, function can realize in consumer electronics device.
Wireless device 718 can comprise can transmit and receive by various applicable wireless communication technologys one or more wireless devices of signal.Such technology can involve the communication across one or more wireless networks.Exemplary wireless network includes, but is not limited to wireless lan (wlan), wireless personal domain network (WPAN), wireless MAN (WMAN), cellular network and satellite network.In the communication across such network, wireless device 718 can be according to adopting the one or more of any version applied codes to operate.
In an embodiment, display 720 can comprise monitor or the display of any television genre.Display 720 can comprise computer display for example, touch screen displays, video monitor, as device and/or the TV of TV.Display 720 can be numeral and/or simulation.In an embodiment, display 720 can be holographic display device.And display 720 can be the transparent surface that can receive visual projection.Such projection can be passed on various forms of information, image and/or object.For example, such projection can be the vision covering for mobile augmented reality (MAR) application.Under the control of one or more software application 716, platform 702 can show user interface 722 on display 720.
In an embodiment, for example, thereby content services device 730 is can be by any country, the world and/or stand-alone service hosted and can be accessed by platform 702 via internet.Content services device 730 can be coupled in platform 702 and/or display 720.Platform 702 and/or content services device 730 can be coupled in network 760 and transmit (for example, send and/or receive) media information to network 760 and from network 760.Content delivery apparatus 740 also can be coupled in platform 702 and/or display 720.
In an embodiment, content services device 730 can comprise the device of cable television box, personal computer, network, phone, support internet maybe can pay numerical information and/or content equipment and can be via network 760 or direct any other similar installation of unidirectional or two-way transmission content between content supplier and platform 702 and/or display 720.To recognize that content can be via network 760 to any one parts in system 700 and content supplier and from wherein unidirectional and/or two-way transmission.The example of content can comprise any media information, and it comprises for example video, music, medical treatment and game information, etc.
Content services device 730 receives such as contents such as cable television programmings (it comprises media information, numerical information), and/or other guide.The example of content supplier can comprise any cable or satellite television or radio or ICP.The example providing is not intended to limit embodiments of the invention.
In an embodiment, platform 702 can be from having navigation controller 750 reception control signals of one or more navigation characteristic.The navigation characteristic of controller 750 can be used for for example user interface 722 mutual.In an embodiment, navigation controller 750 can be fixed-point apparatus, and it can be to allow user for example, by the computer hardware component (human-computer interface device particularly) in the data input computing machine of space (, continuous and multidimensional).Such as graphic user interface (GUI) and many systems such as TV and monitor, allow user by physics gesture, control and provide data to computing machine or TV.
The movement of the navigation characteristic of controller 750 can be by moving hand, cursor, focusing ring or other visual indicator that show on display and for example, respond display (, display 720) is upper.For example, under the control of software application 716, the navigation characteristic being positioned on navigation controller 750 can be mapped to the virtual navigation feature for example showing in user interface 722.In an embodiment, controller 750 is not individual components and being integrated in platform 702 and/or display 720.Yet, embodiment be not limited to the element that illustrates or describe herein or the context that illustrates herein or describe in restricted.
In an embodiment, for example, driver (not shown) can be included in and while being activated, make user can after initial start, utilize touching button and at once open and close platform 702(as TV) technology.When " closing " platform, programmed logic can allow platform 702 that content streaming is arrived to media filter or other guide service unit 730 or content delivery apparatus 740.In addition, chipset 705 can comprise for example hardware and/or the software support to 5.1 surround sound audio frequency and/or high definition 7.1 surround sound audio frequency.Driver can comprise the graphdriver for integrated graphics platform.In an embodiment, graphdriver can comprise peripheral component interconnect (pci) Express graphics card.
In various embodiments, any one or more in the parts shown in system 700 can be integrated.For example, platform 702 and content services device 730 can be integrated, or for example platform 702 and content delivery apparatus 740 can be integrated, or platform 702, content services device 730 and content delivery apparatus 740 can be integrated.In various embodiments, platform 702 and display 720 can be integrated units.For example, display 720 and content services device 730 can be integrated, or display 720 and content delivery apparatus 740 can be integrated.These examples are not intended to limit the present invention.
In various embodiments, system 700 can be embodied as wireless system, wired system or both combinations.When being embodied as wireless system, system 700 can comprise parts and the interface that is suitable for for example, by wireless sharing media (one or more antennas, forwarder, receiver, transceiver, amplifier, wave filter, steering logic, etc.) communication.The example of wireless sharing media can comprise the part such as wireless frequency spectrums such as RF spectrums.When being embodied as wired system, system 700 can comprise and be suitable for parts and the interface of communicating by letter by wire communication medium, such as I/O (I/O) adapter, for connecting physical connector, network interface unit (NIC), disk controller, Video Controller, Audio Controller of I/O adapter and corresponding wire communication medium etc.The example of wire communication medium can comprise line, cable, metal lead wire, printed circuit board (PCB) (PCB), base plate, switching fabric, semiconductor material, twisted-pair feeder, concentric cable, optical fiber etc.
Platform 702 can be set up one or more logical ORs physical channel and carry out transmission information.This information can comprise media information and control information.Media information can refer to any data of indicator to user's content.The example of content can comprise such as the data from voice conversation, video conference, streamcast video, Email (" email ") message, voice mail message, alphanumeric symbol, figure, image, video, text etc.Data from voice conversation can be utterance information, quiet period, ground unrest, comfort noise, tone etc.Control information can refer to any data of indicator to the order of automated system, instruction or control word.For example, control information can be used for by system route media information, or instructs node is processed media information in a predetermined manner.Yet embodiment is not limited in the element of shown in Figure 5 or description or the context of shown in Figure 5 or description unrestricted.
Described above, physical styles or form factor that system 700 can change embody.Fig. 6 diagram wherein can comprise the embodiment of the little form factor device 800 of system 700.In an embodiment, for example, device 800 can be embodied as the mobile computing device with wireless capability.Mobile computing device can refer to have any device of disposal system and moving electric power source or supply (for example one or more batteries).
Described above, the example of mobile computing device (for example can comprise personal computer (PC), laptop computer, super laptop computer, panel computer, touch pad, portable computer, handheld computer, palmtop computer, PDA(Personal Digital Assistant), cell phone, combination cellular phone/PDA, TV, intelligent apparatus, smart phone, Intelligent flat computer or intelligent television), mobile Internet device (MID), message transfer device, data communication equipment (DCE), etc.
The example of mobile computing device also can comprise the computing machine of being arranged to be worn by people, and for example wrist computer, finger computer, ring computing machine, glasses computing machine, belt clamp computing machine, arm band computing machine, footwear computing machine, clothes computing machine and other can be worn computing machine.In an embodiment, for example, mobile computing device can be embodied as can object computer application and the smart phone of voice communication and/or data communication.Although some embodiment can describe with the mobile computing device that is smart phone by example implementation, can recognize that other embodiment also can realize with other wireless mobile calculation elements.Embodiment is unrestricted in this context.
As shown in fig. 6, device 800 can comprise shell 802, display 804, input/output device 806 and antenna 808.Device 800 also can comprise navigation characteristic 812.Display 804 can comprise any applicable display unit, for showing the information that is suitable for mobile computing device.I/O device 806 can comprise any applicable I/O device, for by input information mobile computing device.Example for I/O device 806 can comprise alphanumeric keyboard, numeric keypad, touch pad, enter key, button, switch, rocker switch, microphone, loudspeaker, speech recognition equipment and software, etc.Information also can be by microphone input media 800.Such information can the digitizing by speech recognition equipment.Embodiment is unrestricted in this context.
Various embodiment can realize with hardware element, software element or both combinations.The example of hardware element (for example can comprise processor, microprocessor, circuit, circuit component, transistor, resistor, capacitor, inductor, etc.), integrated circuit, special IC (ASIC), programmable logic device (PLD), digital signal processor (DSP), field programmable gate array (FPGA), logic gate, register, semiconductor devices, chip, microchip, chipset etc.The example of software can comprise software part, program, application, computer program, application program, system program, machine program, operating system software, middleware, firmware, software module, routine, subroutine, function, method, rules, software interface, application programming interfaces (API), instruction set, Accounting Legend Code, computer code, code segment, computer code segments, word, value, symbol or its any combination.Determine whether embodiment uses hardware element and/or software element to realize and can change according to many factors, and example is computation rate, power level, thermotolerance, treatment cycle budget, input data rate, output data rate, memory resource, data bus speed and other design or performance constraints as desired.
One or more aspects of at least one embodiment can be realized by the representative instruction being stored on machine readable media, this machine readable media represents the various logic in processor, and it impels machine construction logic to carry out technology described herein when being read by machine.Such expression (being called " IP kernel ") can be stored on tangible machine readable media and be supplied to various clients or manufacturing facility to pack manufacturing machine into, and in fact it form logical OR processor.
extra example and annotation:
Example 1 can supplying method, it comprise by with hierarchical approaches, carry out a plurality of processing stage during determine continuously in large data sets and until produce individual data unit, to the association index in large data sets, determine concurrently the border value data unit in large data sets with determining determined border value data unit compared with the border value data unit of small data set, wherein each data set comprises a plurality of data entries.
Example 2 can comprise the method for example 1, and it further comprises the association index of data value and this data value is combined in individual data unit, and this individual data unit is stored as to the data entry in large data sets.
Example 3 can comprise the method for example 1, wherein each processing stage and determine that determined value determines compared with the border value data unit between the collection of small data set concurrently to the association index in large data sets.
Example 4 can comprise the method for example 3, wherein determines compared with the border value data unit between the collection of small data set and comprises and use the data entry executable operations of single instruction multiple data (SIMD) parallel instructions ground to each data centralization.
Example 5 can comprise the method for example 4, wherein each processing stage output produce data set, it is as the new data set input to next processing stage and be received.
Example 6 can comprise the method for example 1, wherein receives large data sets first processing stage and inputs as data set.
Example 7 can comprise the method for example 1, wherein large data sets as structuring array stores in database.
Example 8 can comprise the method for example 1, and wherein border value data unit is in minimum value data cell and maximum value data unit.
Example 9 can comprise system, and it comprises determination module, for determining concurrently the border value data unit of large data sets with determining determined border value data unit to the association index in large data sets.
Example 10 can comprise the system of example 9, and it further comprises composite module, for the association index of data value and this data value is combined in individual data unit, and this individual data unit is stored as to the data entry in large data sets.
Example 11 can comprise the system of example 10, wherein determination module is for determining concurrently the border value data unit of large data sets with determining determined border value data unit to the association index in large data, determines continuously during the processing stage that this comprising carry out with hierarchical approaches a plurality of in large data sets compared with the border value data unit of small data set until produce individual data unit.
Example 12 can comprise the system of example 11, wherein each processing stage receive data set and this data set be divided into a plurality of compared with small data set, wherein each data set comprises a plurality of data entries.
Example 13 can comprise the system of example 12, wherein each processing stage and determine that determined value determines compared with the border value data unit between the collection of small data set concurrently to the association index in large data sets.
Example 14 can comprise the system of example 13, wherein determines compared with the border value data unit between the collection of small data set and comprises and use the data entry executable operations of single instruction multiple data (SIMD) parallel instructions ground to each data centralization.
Example 15 can comprise the system of example 14, wherein each processing stage output produce data set, it is as the new data set input to next processing stage and be received.
Example 16 can comprise the system of example 9, and wherein border value data unit is in minimum value data cell and maximum value data unit.
Example 17, at least one computer-readable medium, it comprises instruction, if this instruction is executed by processor, impels computing machine for determining concurrently the border value data unit of large data sets with determining determined border value data unit to the association index in large data sets.
Example 18 can comprise at least one computer-readable medium of example 17, it further comprises instruction, if this instruction is executed by processor, impels computing machine for the association index of data value and this data value is combined in individual data unit, and individual data unit is stored as to the data entry in large data sets.
Example 19 can comprise at least one computer-readable medium of example 18, if wherein this instruction is executed by processor, impel computing machine for determining concurrently the border value data unit of large data sets with determining determined border value data unit to the association index in large data, determine continuously during the processing stage that this comprising carry out with hierarchical approaches a plurality of in large data sets compared with the border value data unit of small data set until produce individual data unit.
Example 20 can comprise at least one computer-readable medium of example 19, wherein each processing stage receive data set and this data set be divided into a plurality of compared with small data set, wherein each data set comprises a plurality of data entries.
Example 21 can comprise at least one computer-readable medium of example 20, wherein each processing stage and determine that determined value determines compared with the border value data unit between the collection of small data set concurrently to the association index in large data sets.
Example 22 can comprise at least one computer-readable medium of example 21, wherein determines compared with the border value data unit compared with between the collection of small data set of small data set and comprises and use the data entry executable operations of single instruction multiple data (SIMD) parallel instructions ground to each data centralization.
Example 23 can comprise at least one computer-readable medium of example 22, wherein each processing stage output produce data set, it is received as the new data set input to next processing stage and receives large data sets first processing stage and inputs as data set.
Example 24 can comprise at least one computer-readable medium of example 17, wherein large data sets as structuring array stores in database.
Example 25 can comprise at least one computer-readable medium system of example 17, and wherein border value data unit is in minimum value data cell and maximum value data unit.
Example also can comprise equipment, and it comprises for carrying out the parts of the method for example 1 to 18 any one.
Embodiments of the invention can be applicable to use together with all types of SIC (semiconductor integrated circuit) (" IC ") chip.The example of these IC chips includes but not limited to processor, controller, chipset parts, programmable logic array (PLA), memory chip, network chip and analog.In addition, in the drawings some, signal conductor represents with line.Some can difference indicate more composition signal path, have digital label indicates many composition signal paths and/or has arrow at one or more ends and indicate main information flow path direction.Yet this should not explain in restrictive mode.On the contrary, such additional detail can be used so that more easily understand circuit together with one or more one exemplary embodiment.The signal wire of any expression, no matter whether there is extra information, in fact can comprise and can in multiple directions, advance and one or more signals that the signaling plan of available any applicable type (numeral or the artificial line for example realized by differential pair, optical fiber cable and/or single ended line) is realized.
Can provide example sizes/models/values/ranges, but embodiments of the invention are not limited to this.For example, along with manufacturing technology (photoetching) is in time and ripe, expection can be manufactured the equipment with reduced size.In addition, for the purpose of simplifying the description and discuss, and in order not obscure some aspect of embodiments of the invention, well-known electric power/grounding connection to IC chip and miscellaneous part can or can in figure, not illustrate.In addition, setting can adopt block diagram form to illustrate to avoid covering embodiments of the invention, and based on the fact be, about the details height of the realization of such block diagram setting, depend on and realize embodiment place platform (that is, such details should completely in those skilled in that art's the visual field).In sets forth specific details (for example, circuit) to describe in the situation of example embodiment of the present invention, can be in the situation that the version that there is no these details or have these details to be put into practice embodiments of the invention, this should be obvious to those skilled in that art.It is nonrestrictive thereby description is regarded as illustrative.
Some embodiment for example can be used and can store the machine of instruction, instruction set (if it is carried out by machine, can impel machine to carry out according to the method for embodiment and/or operation) or tangible computer-readable medium or article are realized.Such machine can comprise, for example any applicable processing platform, computing platform, calculation element, treating apparatus, computing system, disposal system, computing machine, processor or analog, and can use any applicable combination of hardware and/or software and realize.Machine readable media or article can comprise, the memory cell of any applicable type for example, storage arrangement, storer article, storage medium, memory storage, stores, storage medium and/or storage unit, storer for example, removable or irremovable medium, erasable or not erasable medium, can write or rewritable media, numeral or simulation medium, hard disk, floppy disk, compact disk, ROM (read-only memory) (CD-ROM), can recording compressed dish (CD-R), can rewriteable compact disc (CD-RW), CD, magnetic medium, magnet-optical medium, removable memory card or dish, various types of digital versatile discs (DVD), band, magnetic tape cassette or analog.Instruction can comprise the code of any applicable type, for example source code, compiled code, interpretive code, executable code, quiet code, dynamic code, encrypted code, and analog, it uses any applicable senior, rudimentary, object-oriented, vision, compiling and/or explanatory programming language and realizes.
Unless stipulated specially in addition, can recognize the action and/or the processing that such as " processing ", " calculating ", " computing ", " determining " or term similarly etc., refer to computing machine or computing system or similar computing electronics, it is handled in the register of computing system and/or storer and (is for example expressed as physical quantity, electronics) data and/or be transformed to other data, other such data are similarly expressed as the physical quantity in storer, register or other such information storage, transmission or display device of computing system.Embodiment is unrestricted in this this context.
Term " coupling " can be used in reference to the relation (direct or indirect) of any type between the parts of talking about in this article, and applicable to electricity, machinery, fluid, optical, electrical magnetic, electromechanics or other connections.In addition, term " first ", " second " etc. are only discussed for being convenient in this article, and do not have the meaning of special time or time sequencing, unless otherwise noted.
Those skilled in that art can realize the technology widely of recognizing embodiments of the invention from description above in a variety of forms.Therefore, although embodiments of the invention are described together with its particular example, the true scope of embodiments of the invention should be not restricted like this because when research figure, instructions and below claim time other modifications will become obvious to technician.

Claims (25)

1. a method, comprising:
By with hierarchical approaches, carry out a plurality of processing stage during determine continuously in large data sets and until produce individual data unit, to the association index in described large data sets, determine concurrently the border value data unit in described large data sets with determining determined border value data unit compared with the border value data unit of small data set, wherein each data set comprises a plurality of data entries.
2. the method for claim 1, it further comprises:
The association index of data value and described data value is combined in individual data unit, and described individual data unit is stored as to the data entry in described large data sets.
3. the method for claim 1, wherein each processing stage and determine that determined value determines compared with the border value data unit between the collection of small data set concurrently to the association index in described large data sets.
4. method as claimed in claim 3, wherein determines compared with the border value data unit between the collection of small data set and comprises and use the data entry executable operations of single instruction multiple data (SIMD) parallel instructions ground to each data centralization.
5. method as claimed in claim 4, wherein each processing stage output produce as the new data set input to next processing stage and received data set.
6. the method for claim 1, wherein receives described large data sets first processing stage and inputs as data set.
7. the method for claim 1, wherein said large data sets as structuring array stores in database.
8. the method as described in any one in claim 1-7, wherein said border value data unit is in minimum value data cell and maximum value data unit.
9. a system, comprising:
Determination module, for determining concurrently the border value data unit of described large data sets with determining determined border value data unit to the association index in large data sets.
10. system as claimed in claim 9, it further comprises:
Composite module, for the association index of data value and described data value is combined in individual data unit, and is stored as the data entry in described large data sets by described individual data unit.
11. systems as claimed in claim 10, wherein said determination module is for determining concurrently the border value data unit of described large data sets with determining determined border value data unit to the association index in described large data, determines continuously during the processing stage that it comprising carry out with hierarchical approaches a plurality of in described large data sets compared with the border value data unit of small data set until produce individual data unit.
12. systems as claimed in claim 11, wherein each processing stage receive data set and described data set be divided into a plurality of compared with small data set, wherein each data set comprises a plurality of data entries.
13. systems as claimed in claim 12, wherein each processing stage and determine that determined value determines described compared with the border value data unit between the collection of small data set to the association index in described large data sets concurrently.
14. systems as claimed in claim 13, wherein determine the described data entry executable operations of using single instruction multiple data (SIMD) parallel instructions to each data centralization that comprises compared with the border value data unit between the collection of small data set.
15. systems as claimed in claim 14, wherein each processing stage output produce as the new data set input to next processing stage and received data set.
16. systems as described in any one in claim 9-15, wherein said border value data unit is in minimum value data cell and maximum value data unit.
17. 1 kinds of equipment, comprising:
For determining concurrently the parts of the border value data unit of described large data sets with determining determined border value data unit to the association index in large data sets.
18. equipment as claimed in claim 17, it further comprises:
Be used for the data strip object parts that the association index of data value and described data value are combined in individual data unit and described individual data unit are stored as to described large data sets.
19. equipment as claimed in claim 18, it further comprises for determining concurrently the parts of the border value data unit of described large data sets with determining determined border value data unit to the association index in described large data, determines continuously during the processing stage that it comprising carry out with hierarchical approaches a plurality of in described large data sets compared with the border value data unit of small data set until produce individual data unit.
20. equipment as claimed in claim 19, wherein each processing stage receive data set and described data set be divided into a plurality of compared with small data set, wherein each data set comprises a plurality of data entries.
21. equipment as claimed in claim 20, wherein each processing stage and determine that determined value determines compared with the border value data unit between the collection of small data set concurrently to the association index in described large data sets.
22. equipment as claimed in claim 21, wherein determine compared with the border value data unit compared with between the collection of small data set of small data set and comprise and use the data entry executable operations of single instruction multiple data (SIMD) parallel instructions ground to each data centralization.
23. equipment as claimed in claim 22, wherein each processing stage output produce as the new data set input to next processing stage and received data set, and receive described large data sets first processing stage and input as data set.
24. equipment as claimed in claim 17, wherein said large data sets as structuring array stores in database.
25. equipment as described in any one in claim 17-24, wherein said border value data unit is in minimum value data cell and maximum value data unit.
CN201410096786.1A 2013-03-15 2014-03-17 The fast method of minimum and maximum value in large data sets is searched using SIMD instruction collection framework Expired - Fee Related CN104050230B (en)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US201361798288P 2013-03-15 2013-03-15
US61/798288 2013-03-15
US61/798,288 2013-03-15
US13/853,589 US9152663B2 (en) 2013-03-15 2013-03-29 Fast approach to finding minimum and maximum values in a large data set using SIMD instruction set architecture
US13/853589 2013-03-29
US13/853,589 2013-03-29

Publications (2)

Publication Number Publication Date
CN104050230A true CN104050230A (en) 2014-09-17
CN104050230B CN104050230B (en) 2018-04-10

Family

ID=51503065

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410096786.1A Expired - Fee Related CN104050230B (en) 2013-03-15 2014-03-17 The fast method of minimum and maximum value in large data sets is searched using SIMD instruction collection framework

Country Status (1)

Country Link
CN (1) CN104050230B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108920410A (en) * 2018-06-22 2018-11-30 华北理工大学 A kind of big data processing unit and method
CN114840255A (en) * 2022-07-04 2022-08-02 飞腾信息技术有限公司 Method, apparatus and device readable storage medium for processing data

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6341296B1 (en) * 1998-04-28 2002-01-22 Pmc-Sierra, Inc. Method and apparatus for efficient selection of a boundary value
CN101676864A (en) * 2008-09-16 2010-03-24 国际商业机器公司 Method and device for acquiring Euclidean norm of vector in processing system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6341296B1 (en) * 1998-04-28 2002-01-22 Pmc-Sierra, Inc. Method and apparatus for efficient selection of a boundary value
CN101676864A (en) * 2008-09-16 2010-03-24 国际商业机器公司 Method and device for acquiring Euclidean norm of vector in processing system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
INTEL: "Integer Minimum or Maximum Element Search Using Streaming SIMD Extensions", 《INTEGER MINIMUM OR MAXIMUM ELEMENT SEARCH USING》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108920410A (en) * 2018-06-22 2018-11-30 华北理工大学 A kind of big data processing unit and method
CN114840255A (en) * 2022-07-04 2022-08-02 飞腾信息技术有限公司 Method, apparatus and device readable storage medium for processing data

Also Published As

Publication number Publication date
CN104050230B (en) 2018-04-10

Similar Documents

Publication Publication Date Title
US9215530B2 (en) Augmented reality system
CN104025031B (en) Reduce the quantity operated in application to the order that shared memory unit performs
CN104952033A (en) System coherency in a distributed graphics processor hierarchy
CN104756150B (en) It is depth buffered
CN104067318A (en) Time-continuous collision detection using 3d rasterization
CN104951358A (en) Priority based on context preemption
CN104737198B (en) The result of visibility test is recorded in input geometric object granularity
CN105074772A (en) Improved multi-sampling anti-aliasing compression by use of unreachable bit combinations
CN104346224A (en) Using group page fault descriptors to handle context switches and process terminations in graphics processors
CN103533286A (en) Methods and systems with static time frame interpolation exclusion area
CN103927223A (en) Serialized Access To Graphics Resources
CN104040589A (en) Generating random sampling distributions using stochastic rasterization
CN104054049A (en) Reducing number of read/write operations performed by CPU to duplicate source data to enable parallel processing on source data
CN104603844A (en) Reduced bitcount polygon rasterization
KR101597623B1 (en) Fast approach to finding minimum and maximum values in a large data set using simd instruction set architecture
CN107301220A (en) Method, device, equipment and the storage medium of data-driven view
CN104050230A (en) Fast approach to finding minimum and maximum values in a large data set using SIMD instruction set architecture
CN106605243A (en) Graphics workload submissions by unprivileged applications
US10380106B2 (en) Efficient method and hardware implementation for nearest neighbor search
CN104011789A (en) Reducing the number of scaling engines used in a display controller to display a plurality of images on a screen
CN104335249A (en) Analyzing structured light patterns
CN104025035B (en) The quantity to memory I O request is reduced when performing the program of iterative processing continuous data
CN104952100B (en) The streaming of delay coloring compresses antialiasing method
CN104036827B (en) Fuse reparation based on position
WO2014153690A1 (en) Simd algorithm for image dilation and erosion processing

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20180410

Termination date: 20210317