Phoenix, Az. - At Microchip Technology's MASTERS Conference
here Wednesday HI-TECH Software will take the wraps off an "omniscient"
ANSI C compiler for 32-bit MCU code that it claims boosts real-time
response by more than 25% as well as nearly doubling code density.
The new HI-TECH C PRO compiler
for Microchip's
PIC32 MCU uses a new technique called omniscient
code generation(OCG) to optimize stack and register allocation
across all code modules prior to generating the object code. Smaller
code generally executes more quickly and requires smaller, less
expensive flash memory for storage.
It collects comprehensive data on every register, stack, pointer,
object and variable declaration across the entire program. It uses this
information to optimize register usage, stack allocations and pointers
across the whole program. It also ensures consistent variable and
object declarations between modules and deletes unused variables and
functions.
According to CEO and company founder Clyde Stubbs, its performance
on the PIC32 proves out the company's belief that OCG technique should
result in even better performance and code density improvements on
32-bit register-based MCUs than that achieved in 8- and
16-bit MCUs where the company has focused its OCG efforts
previously.
Because PIC32 is based on a MIPS Technologies 32-bit core, he
believes that the performance improvements achieved should be
repeatable on most other MIPS architectural derivatives, as well as
many other RISC-based designs. "Right now we are being somewhat
conservative and are confining ourselves to architectures that have a
clear and large following in the embedded systems market."
Next on the company's agenda is the 32-bit RISC ARM architecture,
with a particular focus on the ARM Cortex-M3, which is targeted
specifically at embedded applications. There, as with most other 32-bit
RISC CPUs, said Stubbs, code is most often generated one module at a
time, using variations of GNU
Compiler Collection (GCC) techniques.
Because GCC generates code one-module-at-a-time, he said, no
comprehensive cross-module data is available. "But without knowing how
objects are used across the whole program, it is impossible to achieve
the same level of optimization as an OCG compiler," said Stubbs.
In code density benchmarks, the company's OCG compiler achieved code
that can be as much as 40% smaller than that generated using industry
leading GCC-based PIC32 compilers. "The smaller code size can cut
device costs by reducing the amount on on-chip flash required," he
said.
Stubbs pointed out what because GCC-based 32-bit compilers are
constrained as to which registers can be used to store parameters for
called functions. "Whenever a function is called from another code
module, the parameters of that function are usually stored in the
registers," said Stubbs, via four specific registers reserved for this
purpose in GCC-based compilers.
The problem is that if the function has more than four parameters,
the additional parameters must be stored on and passed to the called
function using the stack (in RAM) - a cycle intensive process that
degrades performance and leads to increased RAM usage.
Faster Interrupt Handling.
By comparison, he said, interrupt-intensive code generated by
omniscient code compilation typically requires 26% fewer cycles for the
PIC32 to execute than code compiled using a non-OCG compiler.
By reducing the number of CPU cycles spent moving data between the
registers and stack, HI-TECH's OCG compiler effectively gives the CPU a
26% performance boost. More important, called functions frequently call
other functions, which may, in turn call other functions.
"This is particularly true for interrupt intensive applications,"
said Stubbs. "For example, if the code calls a function, which then
calls a second function, the parameters for the first function will
have to be saved to the stack to make room for the parameters for the
second function. "
If this second function calls a third function, the parameters for
the second function will also have to be saved to the stack to make
room for the parameters of the third function.
"Data will have to be shifted continuously between the stack and the
registers," he said. "The penalty for this is at least a cycle every
time data is moved to or from the stack " or 8 cycles to move the data
for a single four-parameter function to the stack and back to the
registers."
Even if other registers are available, the GCC compiler allocates
the extra parameters to the stack once the fixed set of four registers
is full. This process wastes both cycles and RAM. It also results in
code bloat due to the extra instructions required to save function
parameters to the stack.
In contrast, with OCG compilation, said Stubbs, there is perfect
knowledge of the register usage of each function. At any point in the
program, it knows which registers are available and which registers are
not available, and can optimize register usage without any arbitrary
constraints.
"When there are two or three deep function calls, it allocates
parameters for different functions into non-overlapping register sets,
often eliminating the need to store parameters into memory completely,"
he said.
"This results in better utilization of the available registers,
fewer cycles wasted moving parameters between the stacks and the
registers, and less RAM usage. It also contributes to smaller code size
by reducing or eliminating the need for code to save registers to the
stack."
With the use of OCG, the HI-TECH C PRO knows the register usage of
every function in the entire program, including interrupts and any
functions that are called by the interrupt code.
"It also knows exactly which registers need to be saved and restored
for each interrupt routine. The OCG compiler saves only those registers
that are necessary, reducing the size of the interrupt context
switching code, and decreasing the number of cycles required to execute
the interrupt routine."
Improving Memory Optimization.
Since the HI-TECH C PRO compiler knows the usage of every instance of
every variable in the program, it has the ability to optimize the
allocation of every variable between either the stack or the registers.
The optimization is based on the frequency of use of each variable.
Variables that are used intensively can be allocated permanently
to registers, which have no cycle penalty at all. All register and
stack allocations are always optimized to elicit the best overall
performance for the entire program. This highly refined optimization of
memory both boosts performance and minimizes power consumption by
keeping frequently used data in locations that have the shortest access
time.
HI-TECH C PRO for the PIC32 MCU Family is available now through
September 30, 2008 for the introductory price of US$1595, after which
it will sell for $1995.A fully functional 45-day trial version can be
downloaded, free of
charge, at HI-TECH's
website.