|
Since the early days of computers and telephony, interconnection networks have been a critical part of electrical engineering. This has become even more critical in the era of very large-scale integration (VLSI) circuitry because of the drive characteristics of MOS transistors combined with the relatively high capacitance of on-chip interconnects.
The interconnection networks used to connect functional units within a chip can have a significant, even a dominating, effect on the performance of a device. Buses, although the simplest form of interconnect, are a poor choice from a density or power standpoint because the power and space required to drive them at maximum speed grow exponentially with the capacitance of the bus. Furthermore, multi-point connection networks are a poor choice as the entire length of the bus must be driven even when only a single "conversation" may be going on at a time, or where the communication is between direct neighbors. A crossbar is an optimal solution, up to a maximum size determined by the underlying device and wiring technology. In general, the optimal solution to multi-party communication is a network built out of crossbars.
Status Quo
On-chip buses today are a simple and straightforward outgrowth of the system-wide buses used in computer systems. Although there is obvious proof that such buses are functional, they were typically designed for the commercial and technical limitations of their day, when wires were cheap, circuits were expensive, and interconnect was faster than logic.
Today this is not the case. Modern large-scale integrated circuits (ICs) are routinely speed-limited by the interconnect, not by the logic. The evidence is everywhere in devices with multiple clock domains and/or exotic wave-propagation techniques. Logic gates are now plentiful, when considering "Moore's Law" that historically has delivered more gate density than most engineers knew what to do with. This predicament has turned circuit and system design on its head: logic and interconnects are inexpensive and conserving wires is counter-productive.
Buses led to the development of bus standards, but bus standards do not solve the bigger problems of system architecture and data flow Bus standards, like many laws, policies, and regulations, have a way of outliving their usefulness. Bus standards are intolerant of changes in signaling, protocol, bandwidth, or usage model. Buses are, by nature, slow to change and quick to stifle the unexpected. It's been said that they deter innovation and impede originality. Still they remain steadfast beacons of standardization in a sea of ever-present change and progress.
Alternatives to consider
The alternatives to buses are numerous, and all have been used successfully in various computers, chips, boards, application specific integrated circuits (ASICs), and field programmable gate arrays (FPGAs). These alternatives are no panacea, just as buses aren't a cure-all for every interconnection ailment. Avoiding the fixed routing and timetable of a standard bus can open up new avenues for design and restore a bit of glamour and creativity to an otherwise mundane project.
Buses and networks
The alternative to current bus architectures is merely a different kind of bus. To be precise, it's a different interconnect topology, such as a network, switch fabric, or crossbar.
A bus topology, like that shown in Fig 1, generally favors one-to-one or one-to-many communications. These buses were, again, outgrowths of the need to standardize pin- or board-level interfaces to assure device interoperability. Buses may have multiple masters (participants that originate a transaction and source or sink data), but only one master can be active at a time. Inherent in the definition of a bus is its exclusive nature. Only one master can use the bus at a time; all other potential masters must wait. Bus arbitration (i.e., the sharing mechanisms) thus becomes a significant part of any bus specification.

1. A traditional bus topology.
(Click this image to view a larger, more detailed version)
Most buses support multiple masters, although only one master can be active at a time. The master competes for access to the bus, initiates a transaction, waits for the slave (or in the case of a "broadcall" transaction, multiple slaves) to respond, and then relinquishes the bus. The master may then initiate a second transaction or – through arbitration – lose control of the bus to another master. This arrangement can lead to system bottlenecks where waiting for access significantly slows system performance.
More advanced buses support split transactions where the overhead of arbitration or transaction delays are mitigated by overlapping the beginning of the next transaction with the end of the previous one. Although only a few precious cycles are shaved from overall transaction times, nothing is done to alleviate the basic "one-master problem" that is inherent in all buses.
In an ASIC or FPGA, chip-level buses are easily implemented using on-chip wiring resources. Standard chip fabrication techniques provide relatively long, straight metal layers on the top of the chip that are convenient for implementing buses (as well as for distributing power and global clock signals).
Network topology is quite similar to that of a traditional bus that is designed for one-to-one or one-to-many transactions over a shared medium; typically by arbitrating control of the bus using collision-detection and retry algorithms. In addition, slightly overlapping adjoining transactions can be used to save time.
Switch fabrics
Crossbar switches and the more generalized switch fabrics are at once both simpler and more complex than standard buses. Crossbar switches provide a many-to-many communications mechanism for chips or systems with multiple masters and multiple slaves. Unlike a bus or network, crossbars and switch fabrics support multiple simultaneous transactions. This offers obvious improvements in bandwidth, except in the case where only one master conducts a transaction at a time. In that case, a conventional bus or network would work equally as well. In the more common case of multiple masters initiating transactions at unpredictable intervals (and often simultaneously), a switch fabric (as illustrated in Fig 2) yields better results.

2. A typical switch fabric topology.
(Click this image to view a larger, more detailed version)
The multi-master scenario is quite common, even in systems with a single microprocessor or processor core. In fact, more and more chips include more than one processor, where "processor" is defined as a RISC, CISC, video, or network processor/core that executes software.
Even standalone microprocessors often include more than one processor "core," examples being Intel's Core 2 Duo and Freescale's dual-processor QUICC and PowerQUICC communications chips. Fabless semiconductor companies that are targeting the networking and communications markets routinely produce devices with four, ten, or dozens of processors in each chip, all of which use switch fabrics internally.
Apart from increasing overall system bandwidth, switch fabrics also avoid many of the arbitration delays and overhead of buses. The bus is a single resource, whereas switch fabrics are shared. Any number of transactions may proceed simultaneously as long as two masters are not addressing the same slave (or vice versa). When resource conflicts occur, switch fabrics arbitrate like any other shared resource; barring any such conflicts, arbitration is unnecessary.
Switch fabrics thus provide both better bandwidth and lower latency. Memory latency is particularly important in many high-performance designs where processors are fetching code, storing data, or retrieving data and do not wish to incur the time penalties inherent in a shared bus. Total bandwidth (that is, the number of simultaneous master/slave transactions) also improves when multiple masters can address multiple slaves at the same time. Opening the bottleneck to memory would seem to be every chip designer's first order of business. Squeezing all that traffic over a shared bus runs counter to this goal.
|