CN100524252C - Embedded system chip and data read-write processing method - Google Patents

Embedded system chip and data read-write processing method Download PDF

Info

Publication number
CN100524252C
CN100524252C CNB200710077251XA CN200710077251A CN100524252C CN 100524252 C CN100524252 C CN 100524252C CN B200710077251X A CNB200710077251X A CN B200710077251XA CN 200710077251 A CN200710077251 A CN 200710077251A CN 100524252 C CN100524252 C CN 100524252C
Authority
CN
China
Prior art keywords
address
data
cache device
read
victim cache
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB200710077251XA
Other languages
Chinese (zh)
Other versions
CN101135993A (en
Inventor
夏晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CNB200710077251XA priority Critical patent/CN100524252C/en
Publication of CN101135993A publication Critical patent/CN101135993A/en
Application granted granted Critical
Publication of CN100524252C publication Critical patent/CN100524252C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The embedded system chip comprises a sacrifice high speed buffer which uses the fake address never being used in the system and is connected to an interface of master machine and an interface of slave machine in bus crossing module, and is used for saving the data recently written out by the host machine and making decision for the read-write operation on the host machine satisfying the address segment; a decision unit located on the address channel between the bus crossing module and the host machine interface and used for making pre-decision for the address of the host machine read-write operation according to the address segment of the sacrifice high speed buffer, and sends the read-write operation satisfying the address segment of the sacrifice high speed buffer to the sacrifice high speed buffer, and modifying the host machine read-write operation address into the fake address, and sending the fake address to the bus crossing module. The invention can reduce the data traffic accessing the memory interface.

Description

A kind of embedded system chip and data read-write processing method
Technical field
The invention belongs to data processing field, relate in particular to a kind of embedded system chip and data read-write processing method.
Background technology
Along with embedded system chip (System on Chip, SoC) design becomes and becomes increasingly complex, the new demand that comprises the collaborative work of multiprocessor multi-service model, optimizing data stream, frequency upgrading etc. makes that SoC is center (Data Flow Centric) from being that center (Data Processing Centric) develops with the data stream with the data processing.(AHB) there are a lot of limitation in AdvancedHigh-performance Bus, and a plurality of main equipments (Master) have reduced the effective speed of data to the huge load that contention brought of bus for existing interconnection protocol such as high performance bus; Can't support the instruction execution mechanism that flush bonding processor is more senior, for example Cache (Cache) mechanism that under situation about losing efficacy, can continue to hit; The delay that overlength logical path on original bus architecture causes has limited the lifting of bus frequency; Internet protocol (Internet Protocol, IP) design complexity of interface etc.
Under this background, interconnection protocol of new generation has appearred, as advanced extensive interface (AdvancedeXtensible Interface, AXI), open nuclear agreement (Open Core Protocol, OCP), reddish violet (Magenta, a kind of bus protocol) etc. based on transmitted in packets.The characteristic of the interconnection protocol that these are new mainly comprises: a plurality of passages transmit, and dependence is less between the passage; Transmission is carried out based on continuous transmission, and transmission is initiated by individual address continuously; Support request to hang up (Outstanding) and out of order transmission (Out-of-Order).
With the AXI agreement is example, and as shown in Figure 1, the AXI transmission is carried out based on 5 physical channels, will transmit by direction to be divided into five kinds of groupings (Packet), and each concrete transmission has comprised wherein some kinds of groupings (promptly using some kinds of passages).Wherein the dependence between the grouping is very little, does not have fixed phase relation, so each passage of being transmitted of grouping can arbitrarily be provided with, according to using the streamline (Pipelining) that some progression can be set usually.The typical case of AXI transmission continuously is made up of the grouping and the plurality of data of an address and control, and this address at Cache is very convenient in judging.AXI also supports request to hang up and out of order transmission, and promptly can there be some activities (Active) in main equipment or slave unit (Slave) but uncompleted operation, and carry out sequential control by tag identifier (Tag ID).
Bus Cross module (Switch) structure is a kind of essential bus framework of interconnection protocol, usually uses the scene that need visit two or more slave units at two or more main equipments and each main equipment, as shown in Figure 2.(Shared Bus) is different with shared bus, and bus Cross module structure allows different main equipments to visit different slave units simultaneously.Under this structure, each transmission all exchanges based on passage, and each passage all is equivalent to a privately owned connection relation, and main equipment and slave unit just transmit uninterruptedly based on this connection.In Fig. 2, in the time of main equipment A access slave B, main equipment B supposes the transfer rate of two passages support 100Mbyte/s also at access slave A, and the effective transmission speed of 200MByte/s is then just arranged simultaneously on the bus Cross module.
Cache and primary memory (Main Memory) are that unit carries out exchanges data with the data block.(in Central Processing Unit, CPU) reading of data or when instruction,, it was saved in data or the instruction that reads in the Cache simultaneously when central processing unit.When need reading identical or close data for the second time, can from Cache, obtain CPU.Because the speed of Cache is much larger than primary memory speed, thereby make the overall performance of system be greatly improved.Victim cache device (VictimCache) is equivalent to a little full cascade Cache, and it comprises the data block that is replaced owing to disappearance in the Cache, when disappearance takes place, need check the data in the victim cache device simultaneously.
At present, much the higher IP of linking property, sequential and bandwidth demand all selects interconnection protocols of new generation such as OCP, AXI, carries out alternately and form a high-speed core subsystem with bus Cross module structure.As shown in Figure 3, main equipment in this high-speed core subsystem comprises direct memory access controller (DirectMemory Access Controller, DMAC), CPU, digital signal processor (Data SignalProcessor, DSP) and some business modules, slave unit comprises memory interface and one to two bus bridge (bus bridge is used for the slow devices of bridge system), a plurality of main equipments can be visited different slave units simultaneously, main equipment is mutual by memory interface and storer through the bus Cross module, realizes the data write operation.
The bus protocol that OCP, AXI etc. are new has promoted the transmission bandwidth and total line use ratio of bus significantly, but the performance of total system still seriously is limited by the bandwidth bottleneck of memory interface, no matter be DMAC, CPU or business module, visit to slow devices is not a lot, and its general data stream concentrates on the memory interface.Owing to cost sensitivity, be difficult to increase amount of memory, main equipment usually can be concentrated the transmission bandwidth of contention memory interface.Because total bandwidth is limited on the memory interface, and most of dynamic storagies, for example (Double Data Rate, access delay DDR) is very big, just can obtain data so the lower main equipment of priority usually needs to wait for the quite long cycle for Double Data Rate.Under the situation of professional more complicated, memory interface can have a strong impact on system performance, even be optimized with out of order transport property by the request hang-up, perhaps use L2 cache (L2 Cache) to reduce the external access frequency of CPU, still be difficult to effectively solve the congested of memory interface.
Summary of the invention
The purpose of the embodiment of the invention is to provide a kind of embedded system chip, be intended to solve in the prior art because the transmission bandwidth of memory interface is limited, when between main equipment and storer, carrying out reading and writing data, cause memory interface congested easily, influence the problem of system performance.
The embodiment of the invention is achieved in that a kind of embedded system chip, comprises bus Cross module and a plurality of main equipment, and described chip also comprises:
A victim cache device, untapped address dummy in the using system, be connected with a slave unit interface with a host device interface of described bus Cross module, be used to store the data that main equipment writes out recently, the read-write operation of the main equipment that satisfies described victim cache device address field is judged; And
Identifying unit, on the address tunnel between bus Cross module and the main equipment, be used for the address of main equipment read-write operation being judged in advance according to the address field of described victim cache device, the read-write operation that judgement is satisfied described victim cache device address field sends described victim cache device to, when described victim cache device judges that the read-write operation of main equipment satisfies judgement, with the address modification of main equipment read-write operation is address dummy, sends described bus Cross module to.Wherein, after described identifying unit sent described address dummy to described bus Cross module, described bus Cross module utilized described address dummy to set up transmission channel, finished described read-write operation by described transmission channel in described victim cache device.
Another purpose of the embodiment of the invention is to provide a kind of data read-write processing method, and described method comprises the steps:
Receive the read-write operation request of main equipment;
Address field according to the victim cache device of system configuration, judge whether the address of described read-write operation request satisfies the address field of described victim cache device, the read-write operation that described victim cache device address field is satisfied in judgement sends described victim cache device to and judges by described victim cache device;
When the judgement of described victim cache device is satisfied in the address of described read-write operation, be address dummy with the address modification of main equipment read-write operation, send the bus Cross module to, in described victim cache device, finish described read-write operation.
The embodiment of the invention increases the victim cache device on the bus Cross module, read-write operation to main equipment carries out the address judgement, judge then the reading and writing of main equipment operate in the victim cache device and finish if satisfy, thereby reduced the data volume of memory interface visit, saved bandwidth, improve the time-delay situation of data stream, accelerated data access speed, improved the overall performance of system.
Description of drawings
Fig. 1 is the configuration diagram of AXI agreement in the prior art;
Fig. 2 is the existing principle schematic that realizes exchanges data by the bus Cross module;
Fig. 3 is the structural drawing that has the embedded system chip that forms with bus Cross module structure now;
Fig. 4 is the structural drawing of the embedded system chip that forms with bus Cross module structure in the embodiment of the invention;
Fig. 5 is the realization schematic diagram of in the embodiment of the invention ephemeral data being handled;
Fig. 6 is the structural representation that the embodiment of the invention provides the victim cache device;
Fig. 7 is the streamline sequential chart that the arbitration unit of victim cache device in the embodiment of the invention is arbitrated and judged;
Fig. 8 is the structural drawing of the identifying unit that provides of the embodiment of the invention;
Fig. 9 judges mutual sequential chart between identifying unit and the victim cache device in the embodiment of the invention;
Figure 10 is the interim buffer memory scene synoptic diagram of existing network quasi-representative;
Figure 11 is a data flow diagram in the embodiment of the invention under scene shown in Figure 10;
Figure 12 is the data flow diagram that existing a plurality of main equipments carry out data interaction;
Figure 13 is a data flow diagram in the embodiment of the invention under scene shown in Figure 12;
Figure 14 is the data flow diagram that existing higher level's data cache clashes disappearance;
Figure 15 is a data flow diagram in the embodiment of the invention under scene shown in Figure 14.
Embodiment
In order to make purpose of the present invention, technical scheme and advantage clearer,, the present invention is further elaborated below in conjunction with drawings and Examples.Should be appreciated that specific embodiment described herein only in order to explanation the present invention, and be not used in qualification the present invention.
On the structure of existing bus Cross module, increase a host device interface and a slave unit interface in the embodiment of the invention, victim cache device with the mode expanding system level of loopback, store the data that each main equipment writes out recently, the read-write operation of main equipment is judged by identifying unit and victim cache device, judge if satisfy, then read-write operation is finished in the victim cache device, can reduce the data volume of memory interface visit.
Fig. 4 shows the structure of the SOC chip that the embodiment of the invention provides, and only shows the part relevant with the embodiment of the invention for convenience of explanation.
The embodiment of the invention has increased a host device interface and a slave unit interface respectively on the bus Cross module, sacrifice high-speed buffer and link to each other with slave unit Interface ﹠ Bus Cross module by this host device interface, the data that the storage main equipment writes out recently.This victim cache is system-level, and is shared by main equipments such as CPU, DMAC, DSP, business modules.In embodiments of the present invention, untapped idle address in the address using system of victim cache device, the mapping memory space is come in promptly false (Dummy) address.
Identifying unit (Judge Unit, JU) be positioned on the address tunnel of bus Cross module and each host device interface, address to the main equipment read-write operation is judged in advance, the read-write operation that victim cache device address field is satisfied in judgement sends the victim cache device to and further judges, if the victim cache device is judged the address of main equipment read-write operation and is satisfied when judging, identifying unit is to send the bus Cross module to behind the address dummy with the address modification of main equipment read-write operation, thereby finishes the read-write operation of main equipment in the victim cache device.
When main equipment was initiated write operation requests, identifying unit was judged the write address in the write operation requests of main equipment in advance according to the address field that disposes in the victim cache device.When the write address in judging the main equipment write operation requests satisfies the address field of victim cache device, the main equipment write operation requests is sent to the victim cache device do further judgement.If the victim cache device is judged the address of main equipment read-write operation and is satisfied when judging, identifying unit is revised as the write address in the main equipment write operation requests address dummy of victim cache device, and with this address dummy transmission bus Cross module, the bus Cross module utilizes this address dummy to set up transmission channel, by the transmission channel set up the data of main equipment is write the victim cache device.
When main equipment is initiated the read operation request, identifying unit is judged the address of reading in the main equipment read operation request according to the address field of victim cache device configuration, check in the victim cache device and whether store the data that this reads the address correspondence, have then with the address dummy that address modification is the victim cache device of reading in the main equipment read operation request, and with this address dummy transmission bus Cross module, the bus Cross module is set up transmission channel according to this address, and data are returned to main equipment from the victim cache device.
In embodiments of the present invention, the data of handling in the victim cache device are the data with time and spatial locality feature, mainly are that mutual mutually data and the higher level's data cache of interim data in buffer, main equipment clashes the data that need visit when lacking in the former memory interface.As one embodiment of the present of invention, but mainly be the data of aliging with 16/8/4 word (Words) at the buffer memory and the address of main equipment.These data can when main equipment needs these data, can directly be obtained from victim cache device interface by the interim buffer memory of victim cache device, need not the reference-to storage interface, to reduce the data access amount of memory interface.
In the embodiment of the invention, the address field of above-mentioned data is provided with in the victim cache device, and this address field is included in the storage address section, can be the address field of address dummy, also can be the responsive address field in the address dummy section.In embodiments of the present invention, there are mapping relations one by one in address dummy and storage address.The victim cache device is exported to identifying unit with the address field of configuration, and identifying unit is judged the address in the main equipment read-write operation in advance according to this address field.
The access destination of above-mentioned data is memory interface, and but these data all are buffer memorys (Buffer), be that write operation allows to exist delay, the incoherent write operation in address can out of orderly be finished, and main equipment does not need the response message of writing of the final Su Duan of write operation, and its possible data stream is as shown in the table:
Main equipment Data type Data length
CPU The dirty data of discharging when CPU Cache replaces. The Cache data block length, 8 Words.
CPU CPU is to the write operation of storer. 1-4Words。
DMAC Data carrying between Peripheral Interface and the storer. Be generally half of peripheral hardware FIFO, i.e. 4Words.
DMAC Data carrying between business module or DSP and the storer. 1-4KB
Business module Data carrying between business module and the storer. 1-4KB
DSP Data carrying between DSP and the storer. 1-4KB
CPU, DMAC, business module, DSP The Exclusive data transmission. 1-16Words
CPU, DMAC, business module, DSP The Lock data transmission. 1Words-4KB
Data type in the last table roughly can be divided into five classes, the dirty data of discharging when CPU Cache replaces, CPU carry, monopolize (Exclusive) data transmission and locking (Lock) data transmission the data between the direct write operation of storer, business module and the storer, wherein:
The dirty data that CPU Cache discharged when replacing is the data manipulation of CPU on storer, and the data block among these data and the CPU Cache big or small identical is generally 8Words and aligns.In embodiments of the present invention, identifying unit is an address dummy with the address modification of the type data, by the bus Cross module it is write in the victim cache device.According to the principle of locality of program, CPU visits these data possibly at short notice once more, and at this moment, CPU can pass through very fast these data of read-write of victim cache device, can effectively reduce the conflict disappearance of CPU Cache and the visit capacity of memory interface.
CPU comprises multiple to the direct write operation of storer, may be to write logical Cache writing when hitting, and may be the inefficacy of writing of the non-Cache that writes the location, and also may be that CPU is directly to the visit of storer.CPU is low to the last a kind of operation of probability that these data conduct interviews once more, and the data block length of these operations is for being generally 1~4Words, in the embodiment of the invention, the size of data block is generally 16Words in the victim cache device, therefore, if the data length of these operations is 4Words and address align, then it is write the victim cache device, if other data are not hit the victim cache device, through after the relevant judgement, storer writes direct in the victim cache device.The data that write the victim cache device can directly be visited by CPU, to reduce the memory interface visit capacity.
Professional relevant data comprise the data interaction between business module, DSP and the DMAC, the length of these data is longer, according to the table tennis operating structure, the processing of business datum is a table tennis with 1KB~4KB normally, 4~8 mutual execution of table tennis are arranged simultaneously, therefore the capacity of victim cache device is greater than the total amount of data of ping-pong structure, to reduce the conflict disappearance of data.These data have the limitation of short time, promptly write in the near future to be read, but the probability that reads once more are not high.This specific character according to data can be kept in the victim cache device, if other main equipments are visited this data, can obtain data from the victim cache device, to reduce the memory interface visit capacity.
Monopolizing operation can be produced by any one main equipment, receive main equipment monopolize operation requests the time, the victim cache device checks earlier whether interior data has address correlations, if have, then earlier the data of relative address are write out, and then will be monopolized the corresponding data of operation and directly send storer to by main equipment.
Lock operation also can be produced by any one main equipment, is used for the lock onto target slave unit, and it can only be visited by this main equipment, and this operation is mainly used in the data that the visit instantaneity is had relatively high expectations.In embodiments of the present invention, if lock operation sends, then the victim cache device is fixing carries out the address judgement to the main equipment of initiating lock operation, removes up to lock operation.Hit the data in the victim cache device, then this operates in the victim cache device and finishes, if this operation can't be finished in the victim cache device, then directly issues storer.When hit victim cache device address, main equipment can obtain data fast from the victim cache device, reduced the visit capacity of memory interface.
The data of the above-mentioned type can be handled by the victim cache device, and main cause is that the target of its visit all is a dynamic storage.The data of dynamic storage are are only read and write by memory interface, and the data trnascription in the victim cache device can not cause the system conformance problem.Simultaneously, the write operation of dynamic storage all is cacheable, even there is delay also can not influence system's correctness, and main equipment usually do not need final storer feedback yet write response message (main equipment can only obtain the response message of writing of subordinate's buffer memory usually).In addition, the write operation of dynamic storage is only operated with the reading and writing of identical address has correlativity, and adjacent or different address function is not had correlativity, and uncorrelated write data can be finished in proper order not according to initial write operation.
In embodiments of the present invention, owing to can't directly carry out data interaction between main equipment and the main equipment, after certain main equipment obtains data, need be with writing data into memory, and then notify another main equipment that data are taken away.This data characteristic is that data are just once effective, promptly these data removed after, can directly data be abandoned.
For this interim (Temp) data, identifying unit is notified victim cache device (main equipment also directly set id signal notify) with result of determination information by special sign when judging, for example use AR (W) CACHE signalisation victim cache device on the AXI bus.
Though ephemeral data is a dirty data, after being read, can directly abandons, and not need write store, as shown in Figure 5.After ephemeral data is writing the victim cache device, if can be read by main equipment at short notice, can directly abandon after then reading, if long-time main equipment does not read this ephemeral data, the victim cache device might produce this ephemeral data and replace, and this moment, the victim cache device was then with this ephemeral data write store, when later main equipment reads this ephemeral data, need transmit read operation by the victim cache device, from storer, read.
In embodiments of the present invention, because the victim cache device is handled to be the data of storage address, storer has only an access interface, and the solicit operation of this interface of all-access is all undertaken by the victim cache device, and consistance can be guaranteed in victim cache device inside.In addition, the embodiment of the invention does not need to be responsible for the anti-deadlock of AXI yet.
Fig. 6 shows the structure of the victim cache device that the embodiment of the invention provides, and for convenience of explanation, only shows the part relevant with the embodiment of the invention.
By address field dispensing unit 61, can be configured in the address field information that enables of the data of handling in the victim cache device, can be configured by configuration interface by CPU during specific implementation.
In embodiments of the present invention, can pre-configured three address fields: one be to carry out the responsive address field that the victim cache device is handled, and this address field can also can be exactly the address dummy section less than the address dummy section; One is the ephemeral data address field, and this address field is less than responsive address field; One is the address dummy section corresponding with responsive address field, and this address field is the idle address field that does not use in the system.
Because the storage address section is generally 0x? 0000000-0x? 0000000, therefore, can responsive address field and address dummy section be set to 0x? 0000000-0x? 0000000.For example storage address is 0x20000000-0x50000000, and then responsive address field can be set to 0x20000000-0x38000000, and the address dummy section is idle 0x90000000-0xB0000000.Be convenient design, address dummy preferably has identical granularity with storage address, and as above shown in the example, high 4 mappings that storage address and address dummy only need change 32 bit address get final product.
The address dummy section is arranged in the memory map table (Memory Map), and predetermined fixed is in identifying unit.Responsive address field and temporary address section can be by 61 configurations of address field dispensing unit, address field information after address field dispensing unit 61 will dispose passes to identifying unit, be used for the address of main equipment read-write operation is judged in advance, also can realize by the control information of interconnection protocol, for example the AR among the AXI (W) CACHE signal directly offers identifying unit in the time of also can initiating read-write operation by main equipment.
Stored data in buffer in the victim cache device in the data-carrier store (Data RAM) 62.Data message record cell 63 records in the data-carrier store 62 information of each data block of storage, for example the address of data block (Tag ID), whether be dirty (Dirty), whether effectively (Valid), least recently used information (LRU) etc.
Arbitration unit (Arbiter) 64 is arbitrated the decision request that each identifying unit sends, and address information and control information in the decision request that the identifying unit of current period correspondence is sent pass to decision logic unit 66.Arbitration unit 64 can use pipeline system to carry out when the decision request that identifying unit is sent is arbitrated, and judges the throughput of (Judge) to guarantee one-period one, as shown in Figure 7.When particularly locking information is effective, arbitration unit 64 will be fixed the identifying unit that the lock operation decision request is initiated in arbitration, finish up to lock operation.In embodiments of the present invention, arbitration unit 64 can use first in first out (First InFirst Out, FIFO) mode help realizing the relevant consistance of read-write of a plurality of main equipments.
Hit to record in the record cell 65 to satisfy and judge, but the read-write operation request of also from the slave unit interface, not finishing.
As decision logic unit (Judge Logic, JU) 66 receive the address information and control information of the main equipment read operation that arbitration unit 64 transmits after, the address that the data of preserving in the reading of data storer 62 write down in data message record cell 63, compared in the address of this address and main equipment read operation, whether the data of judging the address correspondence in the main equipment read operation request are in data-carrier store 62.
If data are in data-carrier store 62, then decision logic unit 66 writes down this read operation request in hitting record cell 65, and the result of determination information that read operation is hit is returned identifying unit.Identifying unit is corresponding address dummy with the address modification in the main equipment read operation request, sends this address dummy to the bus Cross module.The bus Cross module sends the address tunnel (ADDRS) of this address dummy by the slave unit interface that is connected with the victim cache device to Instruction Register (Command Buffer) 67.Instruction Register 67 transmits a read operation instruction to data-carrier store 62, data-carrier store 62 sends the data channel (RDATAS) of the slave unit interface that data are connected with the bus Cross module by the victim cache device to the bus Cross module, sends data to corresponding main equipment by the bus Cross module.After data read was finished, Instruction Register 67 emptied the record of this read operation request in hitting record cell 65.
If data are not in data-carrier store 62, then the decision logic unit 66 result of determination information that read operation is miss returns to identifying unit, identifying unit directly sends the read operation request of main equipment to the bus Cross module, finishes the read operation of data in storer.
If by comparison, find that partial data is in data-carrier store 62, then decision logic unit 66 judges by the Dirty information of data message record cell 63 records whether these data were upgraded, if upgraded, then will transmit the read operations instruction, and the address of these data is write the address write buffer memory 69 to data-carrier store 62.Data-carrier store 62 sends data to data and writes buffer memory 68, data are write buffer memory 68 and address and are write the write data channel (WDATAM) and the address tunnel (ADDRM) of buffer memory 69 is connected the address of these data and these data by the victim cache device respectively with the bus Cross module host device interface and send the bus Cross module to, by the bus Cross module by the memory interface write store.Then, decision logic unit 66 returns the miss result of determination information of read operation to identifying unit.If data were not upgraded, then revise the corresponding information in the data message record cell 63, data are abandoned, and the result of determination information that read operation is miss returns to identifying unit, identifying unit directly sends main equipment read operation request to the bus Cross module, thereby finishes the read operation of data in storer.
After address information and control information in the main equipment write operation requests that arbitration unit 64 transmits received in decision logic unit 66, the address that the data of preserving in the decision logic unit 66 reading of data storeies 62 write down in data message record cell 63, compare, judge whether address in the main equipment write operation requests is write to hit.
If decision logic unit 66 is judged to write hit, then decision logic unit 66 writes down this write operation requests in hitting record cell 65, and the result of determination information that write operation hits is returned identifying unit.Identifying unit is corresponding address dummy with the address modification in the main equipment write operation requests, send this address dummy to the bus Cross module, the bus Cross module sends the address tunnel (ADDRS) of this address dummy by the slave unit interface that is connected with the victim cache device to Instruction Register 67.Instruction Register 67 transmits a write operation instruction to data-carrier store 62, the bus Cross module is write entry data memory 62 with data by the write data channel (WDATAS) of the slave unit interface that is connected with the victim cache device, with the former Data Update in the data-carrier store 62.Write finish after, the record that Instruction Register 67 will hit this write operation requests in the record cell 65 empties.
If 66 judgements of decision logic unit are write miss, but data need write the victim cache device, for example these data may be read or be updated in the short time, Dirty and Valid information that this moment, decision logic unit 66 write down according to data message record cell 63, whether there is free space in the judgment data storer 62, if have free space in the data-carrier store 62, decision logic unit 66 returns the result of determination information that can write to identifying unit, and in hitting record cell 65 record this write operation requests, subsequent operation is hit identically with above-mentioned writing, and repeats no more.If do not have free space in the data-carrier store 62, but data can abandon, then the corresponding information in the data message record cells 63 is revised in decision logic unit 66, data are abandoned, return the result of determination information that can write to identifying unit, and in hitting record cell 65 this write operation requests of record, subsequent operation is hit identically with above-mentioned writing, and repeats no more.
If do not have free space in the data-carrier store 62, and data cannot abandon, the corresponding information that decision logic unit 66 is revised in the data message record cell 63, the address of these data is write the address write buffer memory 69, transmit a read request to data-carrier store 62, data-carrier store 62 writes data with data and writes buffer memory 68, data are write buffer memory 68 and address and are write the write data channel (WDATAM) of buffer memory 69 is connected data and address by the victim cache device respectively with the bus Cross module host device interface and address tunnel (ADDRM) and send the address of these data and these data to the bus Cross module, by bus Cross module write store.Then, decision logic unit 66 returns the result of determination information that can write to identifying unit, and in hitting record cell 65 this write operation requests of record, subsequent operation is hit identically with writing, and repeats no more.
If 66 judgements of decision logic unit are write miss, and data do not need to write the victim cache device yet, then return the miss result of determination information of writing to identifying unit, identifying unit sends write operation requests to the bus Cross module, by the bus Cross module data is passed through the memory interface write store.
Fig. 8 shows the structure of the identifying unit that the embodiment of the invention provides, and for convenience of explanation, only shows part related to the present invention.
Address determination module 81 receives the read-write operation request of main equipment, except the address of read-write operation, also comprises some control informations in the read-write operation request, for example the bit wide of data, length, Tag ID etc.Address determination module 81 is according to the address field that enables that disposes in the victim cache device, for example address dummy section, responsive address field or ephemeral data address field, judge whether the address of read-write operation request meets the address field of victim cache device, be then read-write operation request to be sent to the slow determination module (JU Slice) 82 of clapping, send the victim cache device to by slow bat determination module 82 and further judge.Otherwise, address determination module 81 judges slowly clap whether pending read-write operation request is arranged in the determination module 82 etc., if whether the read-write operation request that has then judge main equipment is a Tag ID with slow address of clapping in the medium pending read-write operation request of determination module 82, be then to wait for, read-write operation request is sent in order.If slow the bat in the determination module 82 do not waited pending read-write operation request, then explanation allows out of order sending, and address determination module 81 directly sends the read-write operation request of main equipment to the bus Cross module by MUX 84.
The slow determination module 82 of clapping sends decision request according to the read-write operation request that address determination module 81 transmits to the victim cache device, carries the address and the corresponding control information of main equipment read-write operation.The victim cache device obtains after the decision request to be judged the address of the read-write operation of main equipment, judges then returns effective result of determination information to the slow determination module 82 of clapping if satisfy, otherwise return invalid result of determination information to the slow determination module 82 of clapping.
When the invalid result of determination information of receiving that the victim cache device returns, the slow determination module 82 of clapping sends the address and the corresponding control information of main equipment read-write operation to MUX 84, sends the bus Cross module to by MUX 84.When the effective result of determination information of receiving that the victim cache device returns, the slow determination module 82 of clapping sends the address and the corresponding control information of main equipment read-write operation to address modification module 83.
Address modification module 83 is the address dummy of victim cache device with the address modification of main equipment read-write operation, and pass through additional information, for example AW (R) CACHE signal adds the information of hitting, comprise that the address hits or write, whether be ephemeral data, hit or replace which data (, need two and identify) etc. if the victim cache device is 4 the tunnel, send amended address to MUX 84, by MUX 84 transmission bus Cross modules.
The slow two-way handshake signal of clapping between determination module 82 and the victim cache device that can use point-to-point alternately, as shown in Figure 9, the slow determination module 82 of clapping is initiated decision request (Request) to the victim cache device, if victim cache device arbitration result in current period is this identifying unit, then can judge in next bat.If hit or do not have correlativity, the victim cache device only needs two cycles to judge, if having correlativity or replace, then also need return result of determination information to the slow determination module 82 of clapping again after writing out data block.
In embodiments of the present invention, system writes the victim cache device with the 4/8/16Words data of each main equipment, write operation can efficiently be finished for the first time, the victim cache device when take place replacing just with write store, be equivalent to realize a very big buffer memory of writing, write operation there is very big optimization, if main equipment can hit the victim cache device to the read-write operation of data, the reading and writing operation can be finished by the victim cache device, can significantly reduce the load of storer.
For interim data in buffer in the storer, as shown in Figure 10, in existing scheme, image or speech data write DSP by external interface, a certain zone of write store behind the dsp code, CPU writes frame head information simultaneously in another piece zone of storer, DMAC reads data with frame head then, forms complete Frame and sends out.This traffic data is buffered in the storer (storer can by back-pressure with adaptable interface speed) temporarily, storer is read and write at short notice continuously, the tentation data amount is 100Mbps, then the memory interface bandwidth demand is 200Mbps, because external memory storage read-write time-delay is bigger, also there is certain time-delay in data stream.
Data stream as shown in figure 11 after increasing the system cache device, the data stream of DSP and CPU is passed through in the victim cache device that writes direct after identifying unit is judged, ideally, the data that DMAC need carry can be hit the victim cache device, data are obtained from the victim cache device, and data are ephemeral data, do not need write-back memory.In this case, memory interface can be saved bandwidth 200Mbps, and the time-delay of data stream is very little.Under the nonideality, if data be not ephemeral data (differential coding for example, before needing frame data did with reference to), then data finally must write store, shown in the figure double dot dash line, needs memory interface 100Mbps bandwidth.If the temporal locality of data relatively poor (comprising that writing mass data in the short time is not write out in the short time with the data that write), then the victim cache device can and write back storer with data owing to the conflict generation, drawn as the figure dotted line, partial data still needs to obtain from storer, the bandwidth that save this moment will be less than 200Mbps, and the data time-delay also can more satisfactory situation increase.
For the mutual mutual data of each main equipment, existing scheme is the assurance data consistency as shown in figure 12, and main equipment A will write between the shared storage area of A, B, DMAC is transported to exclusively enjoying between the storage area of main equipment B with data then, and main equipment B takes data away more then.Storer is is repeatedly read and write at short notice, and need supposing interactive data quantity is 50Mbps, and then the bandwidth requirement to memory interface is 200Mbps, because external memory storage read-write time-delay is bigger, also there is certain time-delay in data stream.
Data stream as shown in figure 13 after increasing the system cache device, no matter be to share the storage area or exclusively enjoy the storage area, can be mapped on the address of victim cache device, the data that this moment, main equipment A the write out victim cache device that can write direct, the DMAC carrying is read and can be obtained data from the victim cache device with main equipment B.Ideally, data can be hit the victim cache device, and data are ephemeral data, do not need write-back memory, and memory interface can be saved bandwidth 200Mbps, and the time-delay of data stream is very little.Under the non-ideality, if data are not ephemeral datas, then the finally necessary write store of data shown in the figure double dot dash line, needs memory interface 100Mbps bandwidth.If the temporal locality of data is relatively poor, then the victim cache device can be because conflict takes place and data is write back storer, and drawn as the figure dotted line, partial data still needs to obtain from storer, the bandwidth that save this moment will be less than 200Mbps, and the data time-delay also can more satisfactory situation increase.
Data for higher level's data cache conflict disappearance, existing scheme as shown in figure 14, when data dependence relatively poor, when perhaps the way of higher level's data cache is not enough, higher level's data cache can clash disappearance, the data that CPU will read and write are discharged, write in the storer.If CPU needs to read and write this data after very short time, then higher level's data cache needs to obtain these data again from storer.Thisly can cause the pause in CPU dozens of cycle, cause cpu performance significantly to descend the read-write operation of external memory storage.
Data stream as shown in figure 15 after increasing the system cache device, when higher level's data cache clashes disappearance, the data that CPU will read and write are written in the victim cache device, when CPU reads and writes these data after the short time, can obtain data from the victim cache device.Because victim cache device access speed is very fast, approximately 3-5 cycle, much smaller than the dozens of cycle of external memory storage, the decline of cpu performance is improved significantly.
Aspect the concrete read-write operation, adopt the embodiment of the invention on to bus, the operation of hitting the victim cache device for reading and writing only postpones two to the continuous transmission of storer and claps and just can finish, and does not need the reference-to storage interface.Miss for writing, but can write the operation of victim cache device, do not need the reference-to storage interface to finish immediately to the write operation of storer, victim cache device write store again when replacing.Miss for write temporarily, but can write the operation of victim cache device, do not need the reference-to storage interface to finish immediately to the write operation of storer, also need write store during replacement, and use directly can vacate the space after reading and replace the time for data.For the produced simultaneously associative operation of many main equipments, the arbitration unit 64 of victim cache device can utilize the sequencing of FIFO arbitration mechanism assurance associative operation to finish according to the order that it sends.
In sum, adopt the embodiment of the invention, reduced the data volume of memory interface visit, saved bandwidth, improved the time-delay situation of data stream, accelerated data access speed, improved the overall performance of system.
The above only is preferred embodiment of the present invention, not in order to restriction the present invention, all any modifications of being done within the spirit and principles in the present invention, is equal to and replaces and improvement etc., all should be included within protection scope of the present invention.

Claims (10)

1, a kind of embedded system chip comprises bus Cross module and a plurality of main equipment, it is characterized in that described chip also comprises:
A victim cache device, untapped address dummy in the using system, be connected with a slave unit interface with a host device interface of described bus Cross module, be used to store the data that main equipment writes out recently, the read-write operation of the main equipment that satisfies described victim cache device address field is judged; And
Identifying unit, on the address tunnel between bus Cross module and the main equipment, be used for the address of main equipment read-write operation being judged in advance according to the address field of described victim cache device, the read-write operation that judgement is satisfied described victim cache device address field sends described victim cache device to, when described victim cache device judges that the read-write operation of main equipment satisfies judgement, with the address modification of main equipment read-write operation is address dummy, sends described bus Cross module to
Wherein, after described identifying unit sent described address dummy to described bus Cross module, described bus Cross module utilized described address dummy to set up transmission channel, finished described read-write operation by described transmission channel in described victim cache device.
2, embedded system chip as claimed in claim 1, it is characterized in that, the data that described main equipment writes out recently comprise: interim data in buffer in the former memory interface, main equipment is mutual data mutually, the data that need visit when perhaps higher level's data cache clashes disappearance.
3, embedded system chip as claimed in claim 1 is characterized in that, the address field of described victim cache device further includes the ephemeral data address field, and after being read by main equipment, described victim cache device is with the ephemeral data that abandons.
4, embedded system chip as claimed in claim 1 is characterized in that, described victim cache device comprises:
The address field dispensing unit is used to dispose the address field information of described victim cache device, sends described address field information to described identifying unit;
Arbitration unit is used for the decision request that each identifying unit sends is arbitrated, the decision request that the identifying unit of output current period correspondence sends;
Decision logic unit is used for the decision request according to described arbitration unit output, judges that whether described main equipment read-write operation satisfies judgement, returns result of determination information to identifying unit; And
Hit record cell, be used to write down described decision logic unit and judge satisfied judgement, but the read-write operation request of in described slave unit interface, not finishing as yet.
5, embedded system chip as claimed in claim 4 is characterized in that, adopts pipeline system when the decision request that described arbitration unit sends each identifying unit is arbitrated.
6, embedded system chip as claimed in claim 1 is characterized in that, described identifying unit comprises:
The address determination module is used to receive the read-write operation request of main equipment, judges whether the address of described read-write operation request satisfies the address field of described victim cache device;
The slow determination module of clapping is used for the read-write operation request according to the address field that satisfies the victim cache device of address determination module transmission, sends decision request to the victim cache device;
The address modification module is used for judging that at the victim cache device address of main equipment read-write operation satisfies when judging, is address dummy with the address modification of main equipment read-write operation; And
MUX is used for sending the address or the described address dummy of main equipment read-write operation to the bus Cross module.
7, as claim 1 or 4 described embedded system chips, it is characterized in that, the address field information of described victim cache device sends described identifying unit to by the victim cache device, perhaps the control messages by interconnection agreement sends described identifying unit to, directly offers described identifying unit when perhaps initiating read-write operation by main equipment.
8, embedded system chip as claimed in claim 1, it is characterized in that, satisfied being judged to be of read-write operation of described victim cache device judgement main equipment: read-write operation hits described victim cache device, the perhaps miss described victim cache device of write operation, but data need write described victim cache device.
9, a kind of data read-write processing method is characterized in that, described method comprises the steps:
Receive the read-write operation request of main equipment;
Address field according to the victim cache device of system configuration, judge whether the address of described read-write operation request satisfies the address field of described victim cache device, the read-write operation that described victim cache device address field is satisfied in judgement sends described victim cache device to and judges by described victim cache device;
When the judgement of described victim cache device is satisfied in the address of described read-write operation, be address dummy with the address modification of main equipment read-write operation, send the bus Cross module to, in described victim cache device, finish described read-write operation.
10, method as claimed in claim 9, it is characterized in that, being judged to be of described victim cache device satisfied in the address of described read-write operation: described read-write operation hits described victim cache device, the perhaps miss described victim cache device of write operation, but data need write described victim cache device.
CNB200710077251XA 2007-09-20 2007-09-20 Embedded system chip and data read-write processing method Expired - Fee Related CN100524252C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB200710077251XA CN100524252C (en) 2007-09-20 2007-09-20 Embedded system chip and data read-write processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB200710077251XA CN100524252C (en) 2007-09-20 2007-09-20 Embedded system chip and data read-write processing method

Publications (2)

Publication Number Publication Date
CN101135993A CN101135993A (en) 2008-03-05
CN100524252C true CN100524252C (en) 2009-08-05

Family

ID=39160100

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB200710077251XA Expired - Fee Related CN100524252C (en) 2007-09-20 2007-09-20 Embedded system chip and data read-write processing method

Country Status (1)

Country Link
CN (1) CN100524252C (en)

Families Citing this family (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101477505B (en) * 2008-12-23 2012-11-21 无锡中星微电子有限公司 Data transmission method between master and slave equipments through bus
CN102103549A (en) * 2009-12-18 2011-06-22 上海华虹集成电路有限责任公司 Method for replacing cache
CN101853237B (en) * 2010-05-31 2012-07-04 华为技术有限公司 On-chip system and AXI bus transmission method
CN102012872B (en) * 2010-11-24 2012-05-02 烽火通信科技股份有限公司 Level two cache control method and device for embedded system
CN105187283A (en) * 2015-08-21 2015-12-23 中国科学院计算技术研究所 Industrial control network slave station communication method and device based on EtherCAT protocol
CN105182221B (en) * 2015-10-09 2017-12-22 天津国芯科技有限公司 A kind of JTAG MUXs and its connection method in system-on-a-chip
CN106371972B (en) * 2016-08-31 2019-04-05 天津国芯科技有限公司 For solving the method for monitoring bus and device of data consistency between main equipment
CN107391413A (en) * 2017-07-21 2017-11-24 南京华捷艾米软件科技有限公司 Synchronous zero-waiting bus and its access method
CN107728948A (en) * 2017-10-18 2018-02-23 郑州云海信息技术有限公司 A kind of memory performance optimization method and device, computer equipment
US20190250690A1 (en) * 2018-02-09 2019-08-15 Futurewei Technologies Inc Video playback energy consumption control
US10997082B2 (en) 2019-06-25 2021-05-04 Intel Corporation Memory system, computing system, and methods thereof for cache invalidation with dummy address space
CN110321672B (en) * 2019-06-28 2021-04-09 兆讯恒达科技股份有限公司 Method for generating data area scrambling code
CN110704351A (en) * 2019-09-24 2020-01-17 山东华芯半导体有限公司 Host equipment data transmission expansion method based on AXI bus
CN110716888A (en) * 2019-09-27 2020-01-21 山东华芯半导体有限公司 Method for realizing AXI bus cache mechanism
CN111797051B (en) * 2020-06-04 2022-05-17 深圳云天励飞技术股份有限公司 System on chip, data transmission method and broadcast module
CN111651376B (en) * 2020-07-06 2023-09-19 Oppo广东移动通信有限公司 Data reading and writing method, processor chip and computer equipment
CN111984562B (en) * 2020-09-07 2022-05-10 苏州盛科通信股份有限公司 Method for controlling burst access to register, electronic device and storage medium
CN112380148B (en) * 2020-11-30 2022-10-25 海光信息技术股份有限公司 Data transmission method and data transmission device
CN112367236B (en) * 2021-01-12 2021-04-06 南京芯驰半导体科技有限公司 Data scheduling method and system of LIN bus
CN113568866B (en) * 2021-09-23 2022-01-25 深圳市创成微电子有限公司 DSP processor, system and method for interaction between DSP processor and external slave equipment
CN113934378B (en) * 2021-11-01 2024-04-19 新华三技术有限公司合肥分公司 Data caching method, logic device and electronic equipment
CN114063915B (en) * 2021-11-10 2023-08-29 上海航天计算机技术研究所 High-reliability telemetry delay data management method and system for deep space exploration
CN116126763B (en) * 2023-04-17 2023-07-14 苏州浪潮智能科技有限公司 Bus interconnection system, data processing method and device, electronic equipment and medium
CN116561020B (en) * 2023-05-15 2024-04-09 合芯科技(苏州)有限公司 Request processing method, device and storage medium under mixed cache granularity

Also Published As

Publication number Publication date
CN101135993A (en) 2008-03-05

Similar Documents

Publication Publication Date Title
CN100524252C (en) Embedded system chip and data read-write processing method
JP3589394B2 (en) Remote resource management system
US8489792B2 (en) Transaction performance monitoring in a processor bus bridge
US6757768B1 (en) Apparatus and technique for maintaining order among requests issued over an external bus of an intermediate network node
CN111190553B (en) Interconnect system and method using hybrid memory cube links
US7761642B2 (en) Serial advanced technology attachment (SATA) and serial attached small computer system interface (SCSI) (SAS) bridging
CN103927277B (en) CPU and GPU shares the method and device of on chip cache
US8843706B2 (en) Memory management among levels of cache in a memory hierarchy
US7143246B2 (en) Method for supporting improved burst transfers on a coherent bus
CN100595720C (en) Apparatus and method for direct memory access in a hub-based memory system
US8359420B2 (en) External memory based FIFO apparatus
US20120192202A1 (en) Context Switching On A Network On Chip
CN103959261B (en) More kernels interconnection in network processing unit
TWI506444B (en) Processor and method to improve mmio request handling
US20100262788A1 (en) Pre-coherence channel
US9755997B2 (en) Efficient peer-to-peer communication support in SoC fabrics
CN1504913A (en) Ring-topology based multiprocessor data access bus
CN101194242A (en) Memory controller and method for coupling a network and a memory
US11483260B2 (en) Data processing network with flow compaction for streaming data transfer
JP4053208B2 (en) Disk array controller
US20090006777A1 (en) Apparatus for reducing cache latency while preserving cache bandwidth in a cache subsystem of a processor
CN101515295A (en) Realization method for supporting high-speed buffer of hardware database on chip
US20050198438A1 (en) Shared-memory multiprocessor
TW201138379A (en) Directly providing data messages to a protocol layer
US20090006712A1 (en) Data ordering in a multi-node system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20090805

CF01 Termination of patent right due to non-payment of annual fee