Skip to content

Assembler and Object Code

Important For Gate Exam

Assembler is a system program that translates assembly language into machine language and prepares object code for execution.

Functions of Assembler

  • Translate mnemonics into opcodes

  • Assign memory addresses

  • Resolve symbols and labels
  • Process assembler directives
  • Generate object code and error diagnostics

Input and Output

  • Input: Assembly language program
  • Output:
    • Object code
    • Symbol table

    • Literal table

    • Error list

Basic Elements of Assembly Language

  • Mnemonic: symbolic instruction (ADD, MOV)
  • Operand: data or address
  • Label: symbolic name for address
  • Directive: instructions to assembler (START, END, ORIGIN, EQU)

Types of Assembler

  • Single Pass Assembler
    • Scans program once

    • Difficult to handle forward references
    • Faster but complex
  • Two Pass Assembler

    • Scans program twice

    • Handles forward references easily
    • Most commonly used
  • Multi Pass Assembler
    • More than two scans
    • Used in complex architectures

Two Pass Assembler

Assembly is completed in two logical passes.

Pass 1

  • Initialize Location Counter (LC) using operand of START
  • Scan source program sequentially, one statement at a time
  • Assign addresses to each instruction using LC and instruction length ⭐

  • Enter labels into Symbol Table with their computed addresses

  • Build Symbol Table for all local symbols and their values
  • Process assembler directives
    • START: sets initial LC
    • ORIGIN / ORG: updates LC based on expression
    • EQU: assigns value to symbol without allocating memory
    • LTORG: allocates space for literals encountered so far
  • Literals encountered (=5, ='A') are added to Literal Table

  • Allocate space for literals when LTORG or END is seen
  • Compute the total length of the program using final LC

  • Generate Intermediate Code (opcode class + operands, no absolute addresses)

  • No object code generation in this pass

Pass 2

  • Read Intermediate Code generated by Pass 1
  • Use Symbol Table and Literal Table for address resolution ⭐
  • Resolve addresses of all local symbols occurring in instructions
  • Assign addresses to literals from Literal Table
  • Translate mnemonics into machine opcodes ⭐
  • Generate code for all load and store register instructions
  • Perform complete object code generation
  • Produce final object code with resolved addresses
  • Generate program listing (LC, source statement, object code)

  • Final output includes object code, symbol table, literal table, listing

Key View (Exam Oriented)

  • Pass 1 = address calculation + tables + intermediate code

  • Pass 2 = address resolution + machine code + listing This separation is the core idea tested in GATE and PSU exams.

Note: MOT(machine opcode table) is used in both the passes

  • for pass 1 – it is used to get length of mnemonics and modify location counter.
  • for pass 2- it is used to obtain respective machine code

Tables Used by Assembler ⭐

  • Symbol Table
    • Symbol name
    • Address
    • Length
  • Literal Table
    • Literal
    • Address
  • Opcode Table (OPTAB)
    • Mnemonic
    • Opcode
    • Instruction length
  • Pool Table
    • Literal pool information

Assembler Directives

  • START: starting address of program
  • END: end of program
  • ORIGIN: change LC value
  • EQU: assign constant value to symbol
  • LTORG: allocate literals

Error Handling

  • Undefined symbols
  • Duplicate symbols
  • Invalid mnemonics
  • Syntax errors

Advantages of Assembler

  • Efficient and fast execution
  • Full hardware control
  • Useful for system-level programming

Limitations

  • Machine dependent
  • Difficult to write and debug
  • Poor portability

Role in Compiler Design

  • Works as backend for compilers
  • Helps understand symbol management and code generation

View: Assembler is the foundation of compiler design; understanding its passes and tables is critical for mastering compilers and low-level system software.


Mnemonic

  • Human-readable symbolic instruction

  • Used in assembly language

  • Easy to remember and write
  • Example
ADD R1, R2
MOV A, B

Opcode

  • Binary or hexadecimal====machine instruction

  • Used by CPU hardware

  • Not human-readable
  • Example
0001
8B

Key Differences

  • Mnemonic → symbolic name -> It is for Programmer

  • Opcode → machine code value -> It is for Processor

  • Assembler maps mnemonic to opcode using OPTAB ⭐

Relation

Mnemonic --(Assembler)--> Opcode

Opinion: Mnemonics improve human productivity, opcodes optimize machine execution; assembler is the critical bridge between both.


Assembler in part of which Phase of Compiler?

Section titled “Assembler in part of which Phase of Compiler?”

Assembler is part of the Back End of Compiler Design

Exact Position

  • Comes after Code Generation of assembly code
  • Converts assembly code → machine code

Compiler Structure

Source Program
→ Front End (Lexical, Syntax, Semantic)
→ Intermediate Code
→ Code Generation (Assembly)
→ Assembler
→ Object Code

Summary

  • Not a front-end phase
  • Works as a system software supporting the compiler

View: Assembler is not a compiler phase itself, but a mandatory backend component enabling actual execution.


Important Mnemonic and Directive for GATE & Competitive Exam are marked with ”⭐”

Mnemonic

Mnemonic symbolic instruction representing a machine operation

  • First operand → source R1
  • Second operand → destination R2
  • Result is stored in second operand

Example

  • ADD R1, R2 -> R2 = R2 + R1
  • SUB R1, R2 -> R2 = R2 - R1

Data Transfer Mnemonics

  • MOV: copy data from source to destination MOV R1, R2
  • LOAD / LD: load data from memory to register LD R1, A
  • STORE / ST: store data from register to memory ST R1, A
  • XCHG: exchange contents of two operands XCHG R1, R2
  • PUSH: push data onto stack PUSH R1
  • POP: pop data from stack POP R1

Arithmetic Mnemonics

  • ADD: addition ADD R1, R2
  • SUB: subtraction SUB R1, R2
  • MUL: multiplication MUL R1, R2
  • DIV: division DIV R1, R2
  • INC: increment by 1 INC R1
  • DEC: decrement by 1 DEC R1

Logical Mnemonics

  • AND: bitwise AND AND R1, R2
  • OR: bitwise OR OR R1, R2
  • XOR: bitwise XOR XOR R1, R2
  • NOT: bitwise complement NOT R1

Shift / Rotate Mnemonics

  • SHL / SAL: shift left SHL R1, 1
  • SHR: shift right SHR R1, 1
  • ROL: rotate left ROL R1, 1
  • ROR: rotate right ROR R1, 1

Control Transfer Mnemonics

  • JMP: unconditional jump JMP LOOP
  • JZ / JE: jump if zero / equal JZ NEXT
  • JNZ / JNE: jump if not zero / not equal JNZ LOOP
  • JC: jump if carry JC ERROR
  • CALL: call procedure CALL FUNC
  • RET: return from procedure RET

Comparison Mnemonics

  • CMP: compare two operands CMP R1, R2
  • TEST: logical comparison TEST R1, R2

Input / Output Mnemonics

  • IN: input from port IN R1, PORT1
  • OUT: output to port OUT PORT1, R1

Processor Control Mnemonics

  • NOP: no operation NOP
  • HLT: halt processor HLT
  • INT: interrupt call INT 21H

String Mnemonics

  • MOVS: move string MOVS
  • CMPS: compare string CMPS
  • SCAS: scan string SCAS

Assembler Directives

  • START: specifies starting address of program, initializes LC START 100
  • END: marks end of source program, triggers literal allocation END
  • ORIGIN: changes value of LC to a given address or expression ORIGIN LOOP+2
  • EQU: assigns a constant value or address to a symbol MAX EQU 50
  • LTORG: creates a literal pool and assigns addresses to literals LTORG

More Directives

  • DS (Define Storage): reserves memory locations (no initialization) A DS 5
  • DC (Define Constant): allocates memory and initializes with constant value B DC 10
  • USING: tells assembler which register to use as base register USING *,15
  • DROP: removes register from base register list DROP 15
  • ENTRY: declares symbol as entry point for linker ENTRY MAIN
  • EXTRN / EXTERNAL:declares symbol defined in another module EXTRN SUM ⭐

  • CSECT: defines control section (separate relocatable unit) MAIN CSECT
  • ORG: alternative form of ORIGIN (assembler dependent) ORG 200

Subroutine

  • A subroutine is a callable block of code== that ==executes and returns control to the calling point

  • Follows call–return== discipline (==stack based)

Example flow:

CALL SUB
...
SUB:
...
RET

Key points:

  • One active subroutine at a time
  • Uses stack for return address
  • Common in procedural programming

Coroutine

  • A coroutine is a program unit== that ==suspends and resumes execution, not strict call–return

  • Control is transferred cooperatively between routines

Example flow:

resume A → suspend A → resume B → suspend B

Key points:

  • Multiple active routines
  • No implicit return to caller
  • Used in concurrency, generators, schedulers

Core Difference (Exam Focus)

  • Subroutine: call → execute → return

  • Coroutine: resume ↔ suspend

Subroutine vs Coroutine (by Stack Overflow) ⭐

  • The subroutine is a special case of a co-routine. A co-routine is a generalized form of a subroutine which is non-preemptive multitasking.

  • A subroutine always starts its execution from the beginning(first line)==, but a ==co-routine starts from where it left off last time.

This is why we say, the co-routine has multiple entry points== whereas the ==sub-routine has only one.

Yield ‘remembers’ where the co-routine is so when it is called again it will continue where it left off.

For example:

coroutine foo {
yield 1;
yield 2;
yield 3;
}
print foo();
print foo();
print foo();

Prints: 1 2 3

Note: Coroutines may use a return, and behave just like a subroutine

coroutine foo {
return 1;
return 2; //Dead code
return 3;
}
print foo();
print foo();
print foo();

Prints: 1 1 1


What is a Macro

  • A macro is a named block of assembly statements

  • It is expanded by the assembler before actual code generation
  • Expansion = textual substitution, not execution

Key idea

Macro works at compile/assembly time, not run time

Why Macros are Needed

  • Avoid repetitive code

  • Improve readability
  • No CALL/RET overhead (unlike subroutines)

Use case

  • Repeated instruction patterns
  • Parameterized instruction blocks

Macro Processor

  • A system software
  • Runs before or inside assembler
  • Replaces macro calls with macro body

Flow

Source Program
→ Macro Processor (expansion)
→ Pure Assembly Code
→ Assembler
→ Object Code

Basic Macro Structure ⭐

.MACRO MACRO_NAME parameter1, parameter2
statements
.ENDM

Meaning of each part:

  • .MACRO → start macro definition
  • MACRO_NAME → identifier of macro
  • parameters → placeholders
  • .ENDM → end of macro definition

Macro Call

MACRO_NAME actual1, actual2
  • Replaced by macro body
  • Formal parameters substituted with actual values

Example Simple Macro

.MACRO INCR X
ADD X, =1
.ENDM

Call

INCR A

Expansion

ADD A, =1

Macro Variables (Parameters)

  • Formal parameters → used in macro definition
  • Actual parameters → passed during macro call

Example

.MACRO ADD2 A,B
ADD A,B
.ENDM

Call

ADD2 R1,R2

Substitution

ADD R1,R2

Macro Expansion vs Execution (⭐ GATE)

  • Macro expansion → textual replacement
  • Instruction execution → CPU at run time

  • Macro processor does not evaluate logic, only expands text

Conditional Assembly

  • Decisions taken during macro expansion
  • Controlled by assembler directives

Used when:

  • Macro behavior depends on parameter value

IF–ENDC Structure

.IF condition
statements
.ENDC
  • .IF → start conditional expansion
  • .ENDC → end conditional block

Relational Operators in Macro IF

  • EQ → equal to zero
  • NE → not equal to zero
  • GT → greater than zero
  • LT → less than zero
  • GE → ≥ 0
  • LE → ≤ 0

⭐ Important rule

.IF EQ, X → if X == 0
.IF NE, X → if X != 0

Zero is implicit

WORD Directive

.WORD X

Meaning:

  • Allocate 1 word of memory

  • Initialize it with value X

  • Assembler directive, not machine instruction

Use case:

  • Define constants
  • Reserve initialized memory

ENDM

  • Marks end of macro definition

  • Mandatory
  • Assembler stops recording macro body here

ENDC

  • Marks end of conditional block

  • Only used with .IF

Recursive Macros

  • A macro that calls itself

  • Can be direct or indirect

Example 1

.MACRO M1,X
.IF EQ,X
M1 X+1
.ENDC
.IF NE,X
.WORD X
.ENDC
.ENDM

Step-by-step understanding

  • Input parameter: X
  • Case 1: X == 0
    • Macro calls itself with X+1
  • Case 2: X != 0
    • Allocates one word with value X

Pseudo logic

if (X == 0)
call M1(X+1)
if (X != 0)
allocate word X

Why it terminates:

  • First call: X = 0
  • Second call: X = 1
  • Condition EQ fails
  • Expansion stops

Example 2

.MACRO M2,X
.IF EQ,X
M2 X
.ENDC
.IF NE,X
.WORD X+1
.ENDC
.ENDM

Pseudo logic

if (X == 0)
call M2(X)
if (X != 0)
allocate word X+1

Why infinite loop occurs:

  • X never changes
  • X == 0 always true
  • Macro keeps expanding forever

Recursive macro + unchanged argument ⇒ infinite loop

More For EXAM ⭐

Infinite Loop in Macro Processor Occurs when:

  • Recursive macro exists
  • Termination condition never becomes false
  • Argument does not move toward exit condition

Macro vs Subroutine (Exam ⭐)

  • Macro → expanded inline
  • Subroutine → CALL and RET
  • Macro → faster execution
  • Subroutine → less code size

What Macro Processor Does NOT Do

  • No execution
  • No runtime decision
  • No CPU involvement

Typical GATE Questions

  • Identify infinite macro expansion
  • Count number of WORD allocations
  • Predict final expanded code
  • Interpret .IF EQ, X
  • Differentiate macro vs subroutine