RISC-V assembly language is like any other assembly and especially resembles MIPS. Just like any assembly, we have a list of instructions that incrementally get us closer to our solution.
We will be using the riscv-g++
compiler and linking C++ files with assembly files. You will write the assembly files, and the C++ files help make the lab a little bit easier.
Objectives
- Understand on use of integer assembly instructions.
- Understand conditional statement using branches.
- Grasp the different elements of assembly source code.
- Get an idea about what assembly sections store what information.
- Get an idea of the load and store instructions and data sizes.
- Understand how to RISC-V uses the stack for local storage.
Assembly Files
Assembly files end in a .S
(capital S). The compiler includes all stages of compiling, assembling, and linking, but when we pass a file with a capital S, the compiler will skip right to the assembling stage.
We can specify a lowercase .s, but this will skip the preprocessor stage. So, in this class, use capital S.
RISC-V Register File
RISC-V contains 32 integer registers and 32 floating point registers. Through the ABI names, we reserve some of these registers for certain purposes. For example, all registers that start with a t for temporary can be used for any purposes. All registers that start with an a for argument are used for arguments passed to a function. All registers that start with s (except sp) for saved are registers that are preserved across function calls.
Integer Instructions
RISC-V contains integer and logic instructions as well as a few memory instructions. RISC-V is a load/store architecture, so integer instruction operands must be registers.
Instruction Example | Description |
---|---|
lb t0, 8(sp) | Loads (dereferences) from memory address (sp + 8) into register t0. lb = load byte, lh = load halfword, lw = load word, ld = load doubleword. |
sb t0, 8(sp) | Stores (dereferences) from register t0 into memory address (sp + 8). sb = store byte, sh = store halfword, sw = store word, sd = store doubleword. |
add a0, t0, t1 | Adds value of t0 to the value of t1 and stores the sum into a0. |
addi a0, t0, -10 | Adds value of t0 to the value -10 and stores the sum into a0. |
sub a0, t0, t1 | Subtracts value of t1 from value of t0 and stores the difference in a0. |
mul a0, t0, t1 | Multiplies the value of t0 to the value of t1 and stores the product in a0. |
div a1, s3, t3 | Dividies the value of t3 (denominator) from the value of s3 (numerator) and stores the quotient into the register a1. |
rem a1, s3, t3 | Divides the value of t3 (denominator) from the value of s3 (numerator) and stores the remainder into the register a1. |
and a3, t3, s3 | Performs logical AND on operands t3 and s3 and stores the result into the register a3. |
or a3, t3, s3 | Performs logical OR on operands t3 and s3 and stores the result into the register a3. |
xor a3, t3, s3 | Performs logical XOR on operands t3 and s3 and stores the result into the register a3. |
Since RISC-V is a reduced instruction set, many instructions that can be completed by using another instruction are left off. For example, the neg a0, a1
(two’s complement) instruction does not exist. However, this is equivalent to sub a0, zero, a1
. In other words, 0 - a1
is the same as -a1
.
Pseudo Instructions
The assembler provides for several pseudoinstructions, which expand into real instructions. For example, neg above is a pseudoinstruction. Whenever the assembler reads this instruction, it automatically expands it to be the sub instruction. Below is a list of all pseudoinstructions and their function.
Floating Point Instructions
The floating point instructions are prefixed with an f, such as fld, fsw, for floating-point load doubleword and floating point store word, respectively. The floating point instructions come in two flavors: (1) single-precision and (2) double-precision. You can select which data size you want by adding a suffix, which is either .s (for single-precision) or .d (for double-precision).
Notice in the code above, we used the fadd.s
instruction to tell the RISC-V processor to add two single-precision values (ft0 and ft1) and store it as a single precision value into ft2.
We can convert between double and single precision using the instructions fcvt.d.s
(convert from single into double) or the fcvt.s.d
(convert from double to single).
Branching Instructions
Branching instructions are a way to jump to different parts of your code. If we didn’t have branching instructions, the CPU would just be able to execute one instruction after another. With jumps and branches, we can go to any instruction, even out of order!
Branching instructions are how function calls and conditionals are implemented in assembly. Branching refers to the “conditional jump” instructions, such as beq, bne, bgt, bge, blt, ble
for branch-if equals, not equals, greater than, greater than or equals, less than, and less than or equals, respectively.
The branching instructions take three parameters: the two operands (registers) to compare, and then if that comparison holds true, a memory label of the instruction you want to execute. If the branch condition is false, the branch instruction is ignored and the CPU goes to the next instruction below.
The assembly code above implements the following C++ loop.
Taking the contrary view can save us some instructions.
Using the Stack
The stack is used for local memory storage. The stack grows from bottom (high memory) to top (low memory), and the bottom of the stack has a dedicated register called sp for stack pointer.
Whenever we use the saved registers or if we want to preserve a temporary register across a function call, we must save it on the stack. To allocate from the stack, we subtract. To deallocate, we add. Notice we don’t “clean” the stack. This is why uninitialized variables in C++ are considered “garbage”, since anything left on the stack is still there.
The stack MUST be aligned to 8, meaning we must always subtract and add a multiple of 8 from/to the stack.
C++ to Assembly Conversion
A compiler’s job is to convert .cpp files into assembly files, where an assembler will assemble an assembly file into machine code as an object file. A linker then links all object files together into an executable or into a library.
We know that our C++ code boils down into assembly, so whatever we can do in C++, we can also do in assembly. I’ve shown some examples above on how to write a for loop, but let’s take a look at the other C++ constructs.
Functions
Functions are just a memory label to the very first instruction. The application binary interface (ABI) specifies what registers get what parameters and how to return things back and forth. However, all functions have a preamble, which is essentially setting up a stack frame for local storage, and an epilogue, which usually entails loading saved registers and return address and moving the stack pointer before returning.
This code shows that we first allocate 32 bytes from the stack, which is the size of 4 registers. You can see that I subtract all of the necessary space off of the stack first, store the values, run my code, and then execute the epilogue. This was the main purpose for adding an offset to the store and load instructions.
Another thing to note is that I’m storing all caller saved registers. Once again, we must consider all caller saved registers to be destroyed. That includes all temporary, argument, and return address registers. I did save some saved registers above, but recall, if we use the saved registers, we are required to put their original values back in them before we return.
We want one prologue and one epilogue. When we call additional functions, we want our stack to be framed. In programming languages courses, you will hear about stack frames. So, we allocate ourselves ALL of the space necessary for the function, then store to it.
The assembly code above mocks the following C++ code.
If you don’t remember, the label 1f
means to go to the numeric label 1 FORWARD of the given position. This is the opposite of 1b
, which looks for a numeric label 1 BACKWARDS of the given position.
Using Printf
Printf requires that the first parameter be a c-style, null-terminated string, which we can create using the .asciz
assembler directive. The following code gives an example of how to use printf.
The code above shows that we put the first parameter to printf in a0, which is the string we want to output. Then we want to output the values of t0 and t1, so those need to be moved into the other parameter registers a1 and a2, respectively.
Anytime you see a function call, you should be thinking about saving the return address register, like I did above. I might not start off by using the stack, but everytime I type “call”, my fingers automatically expect to start typing something to save the RA (return address) register. Also, remember to always deallocate before you return!
Application Binary Interface (ABI)
We have 8 argument registers a0 through a7. These will be the 8 NON-FLOAT parameters passed to a function. This includes pointers, in which aX will contain a memory address, or pass-by-value, in which aX will contain the actual value. For floating point values only, you will use fa0 through fa7.
The ABI further states that we have to return an integer value via a0 or a floating point value via fa0.
If you have a function that combines integer and floating point, you use whatever number comes first that hasn’t been taken. For example, consider the following prototype.
This function requires that int a be in the register a0, int *b have the memory address that b points to in a1, and the value of float c in fa0. Since we return a float, the result must be put into fa0 before executing the ret
instruction.
Sizing
Take note that we use a0, a1, …, a7. This goes for all sizes, byte, word, doubleword, etc. Remember that we parse out the data size by choosing the instruction. For float versus double, we choose instruction.s versus instruction.d. For example, fadd.s fa0, ft0, ft1
adds single-precision values and fadd.d fa0, ft0, ft1
adds double-precision values.
Leave A Comment
You must be logged in to post a comment.