6.6. MMIO Peripherals¶
The easiest way to create a MMIO peripheral is to use the TLRegisterRouter
or AXI4RegisterRouter
widgets, which abstracts away the details of handling the interconnect protocols and provides a convenient interface for specifying memory-mapped registers. Since Chipyard and Rocket Chip SoCs primarily use Tilelink as the on-chip interconnect protocol, this section will primarily focus on designing Tilelink-based peripherals. However, see generators/chipyard/src/main/scala/example/GCD.scala
for how an example AXI4 based peripheral is defined and connected to the Tilelink graph through converters.
To create a RegisterRouter-based peripheral, you will need to specify a parameter case class for the configuration settings, a bundle trait with the extra top-level ports, and a module implementation containing the actual RTL.
For this example, we will show how to connect a MMIO peripheral which computes the GCD.
The full code can be found in generators/chipyard/src/main/scala/example/GCD.scala
.
In this case we use a submodule GCDMMIOChiselModule
to actually perform the GCD. The GCDModule
class only creates the registers and hooks them up using regmap
.
class GCDMMIOChiselModule(val w: Int) extends Module
with HasGCDIO
{
val s_idle :: s_run :: s_done :: Nil = Enum(3)
val state = RegInit(s_idle)
val tmp = Reg(UInt(w.W))
val gcd = Reg(UInt(w.W))
io.input_ready := state === s_idle
io.output_valid := state === s_done
io.gcd := gcd
when (state === s_idle && io.input_valid) {
state := s_run
} .elsewhen (state === s_run && tmp === 0.U) {
state := s_done
} .elsewhen (state === s_done && io.output_ready) {
state := s_idle
}
when (state === s_idle && io.input_valid) {
gcd := io.x
tmp := io.y
} .elsewhen (state === s_run) {
when (gcd > tmp) {
gcd := gcd - tmp
} .otherwise {
tmp := tmp - gcd
}
}
io.busy := state =/= s_idle
}
trait GCDModule extends HasRegMap {
val io: GCDTopIO
implicit val p: Parameters
def params: GCDParams
val clock: Clock
val reset: Reset
// How many clock cycles in a PWM cycle?
val x = Reg(UInt(params.width.W))
val y = Wire(new DecoupledIO(UInt(params.width.W)))
val gcd = Wire(new DecoupledIO(UInt(params.width.W)))
val status = Wire(UInt(2.W))
val impl = if (params.useBlackBox) {
Module(new GCDMMIOBlackBox(params.width))
} else {
Module(new GCDMMIOChiselModule(params.width))
}
impl.io.clock := clock
impl.io.reset := reset.asBool
impl.io.x := x
impl.io.y := y.bits
impl.io.input_valid := y.valid
y.ready := impl.io.input_ready
gcd.bits := impl.io.gcd
gcd.valid := impl.io.output_valid
impl.io.output_ready := gcd.ready
status := Cat(impl.io.input_ready, impl.io.output_valid)
io.gcd_busy := impl.io.busy
regmap(
0x00 -> Seq(
RegField.r(2, status)), // a read-only register capturing current status
0x04 -> Seq(
RegField.w(params.width, x)), // a plain, write-only register
0x08 -> Seq(
RegField.w(params.width, y)), // write-only, y.valid is set on write
0x0C -> Seq(
RegField.r(params.width, gcd))) // read-only, gcd.ready is set on read
}
6.6.1. Advanced Features of RegField Entries¶
RegField
exposes polymorphic r
and w
methods
that allow read- and write-only memory-mapped registers to be
interfaced to hardware in multiple ways.
RegField.r(2, status)
is used to create a 2-bit, read-only register that captures the current value of thestatus
signal when read.RegField.r(params.width, gcd)
“connects” the decoupled handshaking interfacegcd
to a read-only memory-mapped register. When this register is read via MMIO, theready
signal is asserted. This is in turn connected tooutput_ready
on the GCD module through the glue logic.RegField.w(params.width, x)
exposes a plain register via MMIO, but makes it write-only.RegField.w(params.width, y)
associates the decoupled interface signaly
with a write-only memory-mapped register, causingy.valid
to be asserted when the register is written.
Since the ready/valid signals of y
are connected to the
input_ready
and input_valid
signals of the GCD module,
respectively, this register map and glue logic has the effect of
triggering the GCD algorithm when y
is written. Therefore, the
algorithm is set up by first writing x
and then performing a
triggering write to y
. Polling can be used for status checks.
6.6.2. Connecting by TileLink¶
Once you have these classes, you can construct the final peripheral by extending the TLRegisterRouter
and passing the proper arguments.
The first set of arguments determines where the register router will be placed in the global address map and what information will be put in its device tree entry.
The second set of arguments is the IO bundle constructor, which we create by extending TLRegBundle
with our bundle trait.
The final set of arguments is the module constructor, which we create by extends TLRegModule
with our module trait.
Notice how we can create an analogous AXI4 version of our peripheral.
class GCDTL(params: GCDParams, beatBytes: Int)(implicit p: Parameters)
extends TLRegisterRouter(
params.address, "gcd", Seq("ucbbar,gcd"),
beatBytes = beatBytes)(
new TLRegBundle(params, _) with GCDTopIO)(
new TLRegModule(params, _, _) with GCDModule)
class GCDAXI4(params: GCDParams, beatBytes: Int)(implicit p: Parameters)
extends AXI4RegisterRouter(
params.address,
beatBytes=beatBytes)(
new AXI4RegBundle(params, _) with GCDTopIO)(
new AXI4RegModule(params, _, _) with GCDModule)
6.6.3. Top-level Traits¶
After creating the module, we need to hook it up to our SoC.
Rocket Chip accomplishes this using the cake pattern.
This basically involves placing code inside traits.
In the Rocket Chip cake, there are two kinds of traits: a LazyModule
trait and a module implementation trait.
The LazyModule
trait runs setup code that must execute before all the hardware gets elaborated.
For a simple memory-mapped peripheral, this just involves connecting the peripheral’s TileLink node to the MMIO crossbar.
trait CanHavePeripheryGCD { this: BaseSubsystem =>
private val portName = "gcd"
// Only build if we are using the TL (nonAXI4) version
val gcd = p(GCDKey) match {
case Some(params) => {
if (params.useAXI4) {
val gcd = LazyModule(new GCDAXI4(params, pbus.beatBytes)(p))
pbus.toSlave(Some(portName)) {
gcd.node :=
AXI4Buffer () :=
TLToAXI4 () :=
// toVariableWidthSlave doesn't use holdFirstDeny, which TLToAXI4() needsx
TLFragmenter(pbus.beatBytes, pbus.blockBytes, holdFirstDeny = true)
}
Some(gcd)
} else {
val gcd = LazyModule(new GCDTL(params, pbus.beatBytes)(p))
pbus.toVariableWidthSlave(Some(portName)) { gcd.node }
Some(gcd)
}
}
case None => None
}
}
Note that the GCDTL
class we created from the register router is itself a LazyModule
.
Register routers have a TileLink node simply named “node”, which we can hook up to the Rocket Chip bus.
This will automatically add address map and device tree entries for the peripheral.
Also observe how we have to place additional AXI4 buffers and converters for the AXI4 version of this peripheral.
For peripherals which instantiate a concrete module, or which need to be connected to concrete IOs or wires, a matching concrete trait is necessary. We will make our GCD example output a gcd_busy
signal as a top-level port to demonstrate. In the concrete module implementation trait, we instantiate the top level IO (a concrete object) and wire it to the IO of our lazy module.
trait CanHavePeripheryGCDModuleImp extends LazyModuleImp {
val outer: CanHavePeripheryGCD
val gcd_busy = outer.gcd match {
case Some(gcd) => {
val busy = IO(Output(Bool()))
busy := gcd.module.io.gcd_busy
Some(busy)
}
case None => None
}
}
6.6.4. Constructing the DigitalTop and Config¶
Now we want to mix our traits into the system as a whole.
This code is from generators/chipyard/src/main/scala/DigitalTop.scala
.
class DigitalTop(implicit p: Parameters) extends ChipyardSystem
with testchipip.CanHavePeripheryCustomBootPin // Enables optional custom boot pin
with testchipip.HasPeripheryBootAddrReg // Use programmable boot address register
with testchipip.CanHaveTraceIO // Enables optionally adding trace IO
with testchipip.CanHaveBackingScratchpad // Enables optionally adding a backing scratchpad
with testchipip.CanHavePeripheryBlockDevice // Enables optionally adding the block device
with testchipip.CanHavePeripheryTLSerial // Enables optionally adding the backing memory and serial adapter
with sifive.blocks.devices.i2c.HasPeripheryI2C // Enables optionally adding the sifive I2C
with sifive.blocks.devices.pwm.HasPeripheryPWM // Enables optionally adding the sifive PWM
with sifive.blocks.devices.uart.HasPeripheryUART // Enables optionally adding the sifive UART
with sifive.blocks.devices.gpio.HasPeripheryGPIO // Enables optionally adding the sifive GPIOs
with sifive.blocks.devices.spi.HasPeripherySPIFlash // Enables optionally adding the sifive SPI flash controller
with sifive.blocks.devices.spi.HasPeripherySPI // Enables optionally adding the sifive SPI port
with icenet.CanHavePeripheryIceNIC // Enables optionally adding the IceNIC for FireSim
with chipyard.example.CanHavePeripheryInitZero // Enables optionally adding the initzero example widget
with chipyard.example.CanHavePeripheryGCD // Enables optionally adding the GCD example widget
with chipyard.example.CanHavePeripheryStreamingFIR // Enables optionally adding the DSPTools FIR example widget
with chipyard.example.CanHavePeripheryStreamingPassthrough // Enables optionally adding the DSPTools streaming-passthrough example widget
with nvidia.blocks.dla.CanHavePeripheryNVDLA // Enables optionally having an NVDLA
with chipyard.clocking.HasChipyardPRCI // Use Chipyard reset/clock distribution
with fftgenerator.CanHavePeripheryFFT // Enables optionally having an MMIO-based FFT block
{
override lazy val module = new DigitalTopModule(this)
}
class DigitalTopModule[+L <: DigitalTop](l: L) extends ChipyardSystemModule(l)
with testchipip.CanHaveTraceIOModuleImp
with sifive.blocks.devices.i2c.HasPeripheryI2CModuleImp
with sifive.blocks.devices.pwm.HasPeripheryPWMModuleImp
with sifive.blocks.devices.uart.HasPeripheryUARTModuleImp
with sifive.blocks.devices.gpio.HasPeripheryGPIOModuleImp
with sifive.blocks.devices.spi.HasPeripherySPIFlashModuleImp
with sifive.blocks.devices.spi.HasPeripherySPIModuleImp
with chipyard.example.CanHavePeripheryGCDModuleImp
with freechips.rocketchip.util.DontTouch
Just as we need separate traits for LazyModule
and module implementation, we need two classes to build the system.
The DigitalTop
class contains the set of traits which parameterize and define the DigitalTop
. Typically these traits will optionally add IOs or peripherals to the DigitalTop
.
The DigitalTop
class includes the pre-elaboration code and also a lazy val
to produce the module implementation (hence LazyModule
).
The DigitalTopModule
class is the actual RTL that gets synthesized.
And finally, we create a configuration class in generators/chipyard/src/main/scala/config/RocketConfigs.scala
that uses the WithGCD
config fragment defined earlier.
class WithGCD(useAXI4: Boolean, useBlackBox: Boolean) extends Config((site, here, up) => {
case GCDKey => Some(GCDParams(useAXI4 = useAXI4, useBlackBox = useBlackBox))
})
class GCDTLRocketConfig extends Config(
new chipyard.example.WithGCD(useAXI4=false, useBlackBox=false) ++ // Use GCD Chisel, connect Tilelink
new freechips.rocketchip.subsystem.WithNBigCores(1) ++
new chipyard.config.AbstractConfig)
6.6.5. Testing¶
Now we can test that the GCD is working. The test program is in tests/gcd.c
.
#include "mmio.h"
#define GCD_STATUS 0x2000
#define GCD_X 0x2004
#define GCD_Y 0x2008
#define GCD_GCD 0x200C
unsigned int gcd_ref(unsigned int x, unsigned int y) {
while (y != 0) {
if (x > y)
x = x - y;
else
y = y - x;
}
return x;
}
// DOC include start: GCD test
int main(void)
{
uint32_t result, ref, x = 20, y = 15;
// wait for peripheral to be ready
while ((reg_read8(GCD_STATUS) & 0x2) == 0) ;
reg_write32(GCD_X, x);
reg_write32(GCD_Y, y);
// wait for peripheral to complete
while ((reg_read8(GCD_STATUS) & 0x1) == 0) ;
result = reg_read32(GCD_GCD);
ref = gcd_ref(x, y);
if (result != ref) {
printf("Hardware result %d does not match reference value %d\n", result, ref);
return 1;
}
return 0;
}
// DOC include end: GCD test
This just writes out to the registers we defined earlier.
The base of the module’s MMIO region is at 0x2000 by default.
This will be printed out in the address map portion when you generate the Verilog code.
You can also see how this changes the emitted .json
addressmap files in generated-src
.
Compiling this program with make
produces a gcd.riscv
executable.
Now with all of that done, we can go ahead and run our simulation.
cd sims/verilator
make CONFIG=GCDTLRocketConfig BINARY=../../tests/gcd.riscv run-binary