Assembler and Object Code

Important For Gate Exam

Assembler

Assembler is a system program that translates assembly language into machine language and prepares object code for execution.

Functions of Assembler

Translate mnemonics into opcodes
Assign memory addresses
Resolve symbols and labels
Process assembler directives
Generate object code and error diagnostics

Input and Output

Input: Assembly language program
Output:
- Object code
- Symbol table
- Literal table
- Error list

Basic Elements of Assembly Language

Mnemonic: symbolic instruction (ADD, MOV)
Operand: data or address
Label: symbolic name for address
Directive: instructions to assembler (START, END, ORIGIN, EQU)

Types of Assembler

Single Pass Assembler
- Scans program once
- Difficult to handle forward references ⭐
- Faster but complex
Two Pass Assembler
- Scans program twice
- Handles forward references easily ⭐
- Most commonly used
Multi Pass Assembler
- More than two scans
- Used in complex architectures

Two Pass Assembler

Assembly is completed in two logical passes.

Pass 1

Initialize Location Counter (LC) using operand of START
Scan source program sequentially, one statement at a time
Assign addresses to each instruction using LC and instruction length ⭐
Enter labels into Symbol Table with their computed addresses
Build Symbol Table for all local symbols and their values
Process assembler directives
- START: sets initial LC
- ORIGIN / ORG: updates LC based on expression
- EQU: assigns value to symbol without allocating memory
- LTORG: allocates space for literals encountered so far
Literals encountered (=5, ='A') are added to Literal Table
Allocate space for literals when LTORG or END is seen
Compute the total length of the program using final LC
Generate Intermediate Code (opcode class + operands, no absolute addresses)
No object code generation in this pass

Pass 2

Read Intermediate Code generated by Pass 1
Use Symbol Table and Literal Table for address resolution ⭐
Resolve addresses of all local symbols occurring in instructions
Assign addresses to literals from Literal Table
Translate mnemonics into machine opcodes ⭐
Generate code for all load and store register instructions
Perform complete object code generation
Produce final object code with resolved addresses
Generate program listing (LC, source statement, object code)
Final output includes object code, symbol table, literal table, listing

Key View (Exam Oriented)

Pass 1 = address calculation + tables + intermediate code
Pass 2 = address resolution + machine code + listing This separation is the core idea tested in GATE and PSU exams.

Note: MOT(machine opcode table) is used in both the passes

for pass 1 – it is used to get length of mnemonics and modify location counter.
for pass 2- it is used to obtain respective machine code

Tables Used by Assembler ⭐

Symbol Table
- Symbol name
- Address
- Length
Literal Table
- Literal
- Address
Opcode Table (OPTAB)
- Mnemonic
- Opcode
- Instruction length
Pool Table
- Literal pool information

Assembler Directives

START: starting address of program
END: end of program
ORIGIN: change LC value
EQU: assign constant value to symbol
LTORG: allocate literals

Error Handling

Undefined symbols
Duplicate symbols
Invalid mnemonics
Syntax errors

Advantages of Assembler

Efficient and fast execution
Full hardware control
Useful for system-level programming

Limitations

Machine dependent
Difficult to write and debug
Poor portability

Role in Compiler Design

Works as backend for compilers
Helps understand symbol management and code generation

View: Assembler is the foundation of compiler design; understanding its passes and tables is critical for mastering compilers and low-level system software.

Mnemonic vs Opcode

Mnemonic

Human-readable symbolic instruction
Used in assembly language
Easy to remember and write
Example

ADD R1, R2
MOV A, B

Opcode

Binary or hexadecimal====machine instruction
Used by CPU hardware
Not human-readable
Example

0001
8B

Key Differences

Mnemonic → symbolic name -> It is for Programmer
Opcode → machine code value -> It is for Processor
Assembler maps mnemonic to opcode using OPTAB ⭐

Relation

Mnemonic  --(Assembler)-->  Opcode

Opinion: Mnemonics improve human productivity, opcodes optimize machine execution; assembler is the critical bridge between both.

Assembler in part of which Phase of Compiler?

Assembler is part of the Back End of Compiler Design

Exact Position

Comes after Code Generation of assembly code
Converts assembly code → machine code

Compiler Structure

Source Program
 → Front End (Lexical, Syntax, Semantic)
 → Intermediate Code
 → Code Generation (Assembly)
 → Assembler
 → Object Code

Summary

Not a front-end phase
Works as a system software supporting the compiler

View: Assembler is not a compiler phase itself, but a mandatory backend component enabling actual execution.

Mnemonic & Assembler Directives

Important Mnemonic and Directive for GATE & Competitive Exam are marked with ”⭐”

Mnemonic

Mnemonic symbolic instruction representing a machine operation

First operand → source R1
Second operand → destination R2
Result is stored in second operand ⭐

Example

ADD R1, R2 -> R2 = R2 + R1
SUB R1, R2 -> R2 = R2 - R1

Data Transfer Mnemonics

MOV: copy data from source to destination MOV R1, R2 ⭐
LOAD / LD: load data from memory to register LD R1, A ⭐
STORE / ST: store data from register to memory ST R1, A ⭐
XCHG: exchange contents of two operands XCHG R1, R2
PUSH: push data onto stack PUSH R1
POP: pop data from stack POP R1

Arithmetic Mnemonics

ADD: addition ADD R1, R2 ⭐
SUB: subtraction SUB R1, R2 ⭐
MUL: multiplication MUL R1, R2
DIV: division DIV R1, R2
INC: increment by 1 INC R1
DEC: decrement by 1 DEC R1

Logical Mnemonics

AND: bitwise AND AND R1, R2 ⭐
OR: bitwise OR OR R1, R2 ⭐
XOR: bitwise XOR XOR R1, R2 ⭐
NOT: bitwise complement NOT R1

Shift / Rotate Mnemonics

SHL / SAL: shift left SHL R1, 1 ⭐
SHR: shift right SHR R1, 1 ⭐
ROL: rotate left ROL R1, 1
ROR: rotate right ROR R1, 1

Control Transfer Mnemonics

JMP: unconditional jump JMP LOOP ⭐
JZ / JE: jump if zero / equal JZ NEXT ⭐
JNZ / JNE: jump if not zero / not equal JNZ LOOP ⭐
JC: jump if carry JC ERROR
CALL: call procedure CALL FUNC ⭐
RET: return from procedure RET ⭐

Comparison Mnemonics

CMP: compare two operands CMP R1, R2 ⭐
TEST: logical comparison TEST R1, R2

Input / Output Mnemonics

IN: input from port IN R1, PORT1
OUT: output to port OUT PORT1, R1

Processor Control Mnemonics

NOP: no operation NOP ⭐
HLT: halt processor HLT ⭐
INT: interrupt call INT 21H

String Mnemonics

MOVS: move string MOVS
CMPS: compare string CMPS
SCAS: scan string SCAS

Assembler Directives

START: specifies starting address of program, initializes LC START 100 ⭐
END: marks end of source program, triggers literal allocation END ⭐
ORIGIN: changes value of LC to a given address or expression ORIGIN LOOP+2 ⭐
EQU: assigns a constant value or address to a symbol MAX EQU 50 ⭐
LTORG: creates a literal pool and assigns addresses to literals LTORG ⭐

More Directives

DS (Define Storage): reserves memory locations (no initialization) A DS 5 ⭐
DC (Define Constant): allocates memory and initializes with constant value B DC 10 ⭐
USING: tells assembler which register to use as base register USING *,15
DROP: removes register from base register list DROP 15
ENTRY: declares symbol as entry point for linker ENTRY MAIN
EXTRN / EXTERNAL:declares symbol defined in another module EXTRN SUM ⭐
CSECT: defines control section (separate relocatable unit) MAIN CSECT
ORG: alternative form of ORIGIN (assembler dependent) ORG 200

Subroutine vs Coroutine

Subroutine

A subroutine is a callable block of code== that ==executes and returns control to the calling point
Follows call–return== discipline (==stack based)

Example flow:

CALL SUB
...
SUB:
  ...
  RET

Key points:

One active subroutine at a time
Uses stack for return address
Common in procedural programming

Coroutine

A coroutine is a program unit== that ==suspends and resumes execution, not strict call–return
Control is transferred cooperatively between routines

Example flow:

resume A → suspend A → resume B → suspend B

Key points:

Multiple active routines
No implicit return to caller
Used in concurrency, generators, schedulers

Core Difference (Exam Focus)

Subroutine: call → execute → return

Coroutine: resume ↔ suspend

Subroutine vs Coroutine (by Stack Overflow) ⭐

The subroutine is a special case of a co-routine. A co-routine is a generalized form of a subroutine which is non-preemptive multitasking.
A subroutine always starts its execution from the beginning(first line)==, but a ==co-routine starts from where it left off last time.

This is why we say, the co-routine has multiple entry points== whereas the ==sub-routine has only one.

Yield ‘remembers’ where the co-routine is so when it is called again it will continue where it left off.

For example:

  coroutine foo {
    yield 1;
    yield 2;
    yield 3;
  }
  print foo();
  print foo();
  print foo();

Prints: 1 2 3

Note: Coroutines may use a return, and behave just like a subroutine

  coroutine foo {
    return 1;
    return 2; //Dead code
    return 3;
  }
  print foo();
  print foo();
  print foo();

Prints: 1 1 1

Macro Processor ⭐

What is a Macro

A macro is a named block of assembly statements
It is expanded by the assembler before actual code generation
Expansion = textual substitution, not execution

Key idea

Macro works at compile/assembly time, not run time

Why Macros are Needed

Avoid repetitive code
Improve readability
No CALL/RET overhead (unlike subroutines)

Use case

Repeated instruction patterns
Parameterized instruction blocks

Macro Processor

A system software
Runs before or inside assembler
Replaces macro calls with macro body

Flow

Source Program
→ Macro Processor (expansion)
→ Pure Assembly Code
→ Assembler
→ Object Code

Basic Macro Structure ⭐

.MACRO MACRO_NAME parameter1, parameter2
  statements
.ENDM

Meaning of each part:

.MACRO → start macro definition
MACRO_NAME → identifier of macro
parameters → placeholders
.ENDM → end of macro definition

Macro Call

MACRO_NAME actual1, actual2

Replaced by macro body
Formal parameters substituted with actual values

Example Simple Macro

.MACRO INCR X
ADD X, =1
.ENDM

Call

INCR A

Expansion

ADD A, =1

Macro Variables (Parameters)

Formal parameters → used in macro definition
Actual parameters → passed during macro call

Example

.MACRO ADD2 A,B
ADD A,B
.ENDM

Call

ADD2 R1,R2

Substitution

ADD R1,R2

Macro Expansion vs Execution (⭐ GATE)

Macro expansion → textual replacement
Instruction execution → CPU at run time
Macro processor does not evaluate logic, only expands text

Conditional Assembly

Decisions taken during macro expansion
Controlled by assembler directives

Used when:

Macro behavior depends on parameter value

IF–ENDC Structure

.IF condition
  statements
.ENDC

.IF → start conditional expansion
.ENDC → end conditional block

Relational Operators in Macro IF

EQ → equal to zero
NE → not equal to zero
GT → greater than zero
LT → less than zero
GE → ≥ 0
LE → ≤ 0

⭐ Important rule

.IF EQ, X   → if X == 0
.IF NE, X   → if X != 0

Zero is implicit

WORD Directive

.WORD X

Meaning:

Allocate 1 word of memory
Initialize it with value X
Assembler directive, not machine instruction

Use case:

Define constants
Reserve initialized memory

ENDM

Marks end of macro definition
Mandatory
Assembler stops recording macro body here

ENDC

Marks end of conditional block
Only used with .IF

Recursive Macros

A macro that calls itself
Can be direct or indirect

Example 1

.MACRO M1,X
.IF EQ,X
M1 X+1
.ENDC
.IF NE,X
.WORD X
.ENDC
.ENDM

Step-by-step understanding

Input parameter: X
Case 1: X == 0
- Macro calls itself with X+1
Case 2: X != 0
- Allocates one word with value X

Pseudo logic

if (X == 0)
  call M1(X+1)
if (X != 0)
  allocate word X

Why it terminates:

First call: X = 0
Second call: X = 1
Condition EQ fails
Expansion stops

Example 2

.MACRO M2,X
.IF EQ,X
M2 X
.ENDC
.IF NE,X
.WORD X+1
.ENDC
.ENDM

Pseudo logic

if (X == 0)
  call M2(X)
if (X != 0)
  allocate word X+1

Why infinite loop occurs:

X never changes
X == 0 always true
Macro keeps expanding forever

Recursive macro + unchanged argument ⇒ infinite loop

More For EXAM ⭐

Infinite Loop in Macro Processor Occurs when:

Recursive macro exists
Termination condition never becomes false
Argument does not move toward exit condition

Macro vs Subroutine (Exam ⭐)

Macro → expanded inline
Subroutine → CALL and RET
Macro → faster execution
Subroutine → less code size

What Macro Processor Does NOT Do

No execution
No runtime decision
No CPU involvement

Typical GATE Questions

Identify infinite macro expansion
Count number of WORD allocations
Predict final expanded code
Interpret .IF EQ, X
Differentiate macro vs subroutine