Assembler and Object Code
Important For Gate Exam
Assembler
Section titled “Assembler”Assembler is a system program that translates assembly language into machine language and prepares object code for execution.
Functions of Assembler
-
Translate mnemonics into opcodes
-
Assign memory addresses
- Resolve symbols and labels
- Process assembler directives
- Generate object code and error diagnostics
Input and Output
- Input: Assembly language program
- Output:
- Object code
-
Symbol table
-
Literal table
- Error list
Basic Elements of Assembly Language
- Mnemonic: symbolic instruction (ADD, MOV)
- Operand: data or address
- Label: symbolic name for address
-
Directive: instructions to assembler (START, END, ORIGIN, EQU)
Types of Assembler
- Single Pass Assembler
-
Scans program once
- Difficult to handle forward references ⭐
- Faster but complex
-
-
Two Pass Assembler
-
Scans program twice
- Handles forward references easily ⭐
- Most commonly used
-
- Multi Pass Assembler
- More than two scans
- Used in complex architectures
Two Pass Assembler
Assembly is completed in two logical passes.
Pass 1
- Initialize Location Counter (LC) using operand of
START - Scan source program sequentially, one statement at a time
-
Assign addresses to each instruction using LC and instruction length ⭐
-
Enter labels into Symbol Table with their computed addresses
- Build Symbol Table for all local symbols and their values
- Process assembler directives
START: sets initial LCORIGIN / ORG: updates LC based on expressionEQU: assigns value to symbol without allocating memoryLTORG: allocates space for literals encountered so far
-
Literals encountered (=5, ='A') are added to Literal Table
- Allocate space for literals when
LTORGorENDis seen -
Compute the total length of the program using final LC
-
Generate Intermediate Code (opcode class + operands, no absolute addresses)
-
No object code generation in this pass
Pass 2
- Read Intermediate Code generated by Pass 1
- Use Symbol Table and Literal Table for address resolution ⭐
- Resolve addresses of all local symbols occurring in instructions
- Assign addresses to literals from Literal Table
- Translate mnemonics into machine opcodes ⭐
- Generate code for all load and store register instructions
- Perform complete object code generation
- Produce final object code with resolved addresses
-
Generate program listing (LC, source statement, object code)
- Final output includes object code, symbol table, literal table, listing
Key View (Exam Oriented)
-
Pass 1 = address calculation + tables + intermediate code
-
Pass 2 = address resolution + machine code + listing This separation is the core idea tested in GATE and PSU exams.
Note: MOT(machine opcode table) is used in both the passes
- for pass 1 – it is used to get length of mnemonics and modify location counter.
- for pass 2- it is used to obtain respective machine code
Tables Used by Assembler ⭐
- Symbol Table
- Symbol name
- Address
- Length
- Literal Table
- Literal
- Address
- Opcode Table (OPTAB)
- Mnemonic
- Opcode
- Instruction length
- Pool Table
- Literal pool information
Assembler Directives
- START: starting address of program
- END: end of program
- ORIGIN: change LC value
- EQU: assign constant value to symbol
- LTORG: allocate literals
Error Handling
- Undefined symbols
- Duplicate symbols
- Invalid mnemonics
- Syntax errors
Advantages of Assembler
- Efficient and fast execution
- Full hardware control
- Useful for system-level programming
Limitations
- Machine dependent
- Difficult to write and debug
- Poor portability
Role in Compiler Design
- Works as backend for compilers
- Helps understand symbol management and code generation
View: Assembler is the foundation of compiler design; understanding its passes and tables is critical for mastering compilers and low-level system software.
Mnemonic vs Opcode
Section titled “Mnemonic vs Opcode”Mnemonic
-
Human-readable symbolic instruction
-
Used in assembly language
- Easy to remember and write
- Example
ADD R1, R2MOV A, BOpcode
-
Binary or hexadecimal====machine instruction
-
Used by CPU hardware
- Not human-readable
- Example
00018BKey Differences
-
Mnemonic → symbolic name -> It is for Programmer
-
Opcode → machine code value -> It is for Processor
-
Assembler maps mnemonic to opcode using OPTAB ⭐
Relation
Mnemonic --(Assembler)--> OpcodeOpinion: Mnemonics improve human productivity, opcodes optimize machine execution; assembler is the critical bridge between both.
Assembler in part of which Phase of Compiler?
Section titled “Assembler in part of which Phase of Compiler?”Assembler is part of the Back End of Compiler Design
Exact Position
- Comes after Code Generation of assembly code
- Converts assembly code → machine code
Compiler Structure
Source Program → Front End (Lexical, Syntax, Semantic) → Intermediate Code → Code Generation (Assembly) → Assembler → Object CodeSummary
- Not a front-end phase
- Works as a system software supporting the compiler
View: Assembler is not a compiler phase itself, but a mandatory backend component enabling actual execution.
Mnemonic & Assembler Directives
Section titled “Mnemonic & Assembler Directives”Important Mnemonic and Directive for GATE & Competitive Exam are marked with ”⭐”
Mnemonic
Mnemonic symbolic instruction representing a machine operation
- First operand → source
R1 - Second operand → destination
R2 -
Result is stored in second operand ⭐
Example
ADD R1, R2->R2 = R2 + R1SUB R1, R2->R2 = R2 - R1
Data Transfer Mnemonics
- MOV: copy data from source to destination
MOV R1, R2⭐ - LOAD / LD: load data from memory to register
LD R1, A⭐ - STORE / ST: store data from register to memory
ST R1, A⭐ - XCHG: exchange contents of two operands
XCHG R1, R2 - PUSH: push data onto stack
PUSH R1 - POP: pop data from stack
POP R1
Arithmetic Mnemonics
- ADD: addition
ADD R1, R2⭐ - SUB: subtraction
SUB R1, R2⭐ - MUL: multiplication
MUL R1, R2 - DIV: division
DIV R1, R2 - INC: increment by 1
INC R1 - DEC: decrement by 1
DEC R1
Logical Mnemonics
- AND: bitwise AND
AND R1, R2⭐ - OR: bitwise OR
OR R1, R2⭐ - XOR: bitwise XOR
XOR R1, R2⭐ - NOT: bitwise complement
NOT R1
Shift / Rotate Mnemonics
- SHL / SAL: shift left
SHL R1, 1⭐ - SHR: shift right
SHR R1, 1⭐ - ROL: rotate left
ROL R1, 1 - ROR: rotate right
ROR R1, 1
Control Transfer Mnemonics
- JMP: unconditional jump
JMP LOOP⭐ - JZ / JE: jump if zero / equal
JZ NEXT⭐ - JNZ / JNE: jump if not zero / not equal
JNZ LOOP⭐ - JC: jump if carry
JC ERROR - CALL: call procedure
CALL FUNC⭐ - RET: return from procedure
RET⭐
Comparison Mnemonics
- CMP: compare two operands
CMP R1, R2⭐ - TEST: logical comparison
TEST R1, R2
Input / Output Mnemonics
- IN: input from port
IN R1, PORT1 - OUT: output to port
OUT PORT1, R1
Processor Control Mnemonics
- NOP: no operation
NOP⭐ - HLT: halt processor
HLT⭐ - INT: interrupt call
INT 21H
String Mnemonics
- MOVS: move string
MOVS - CMPS: compare string
CMPS - SCAS: scan string
SCAS
Assembler Directives
- START: specifies starting address of program, initializes LC
START 100⭐ - END: marks end of source program, triggers literal allocation
END⭐ - ORIGIN: changes value of LC to a given address or expression
ORIGIN LOOP+2⭐ - EQU: assigns a constant value or address to a symbol
MAX EQU 50⭐ - LTORG: creates a literal pool and assigns addresses to literals
LTORG⭐
More Directives
- DS (Define Storage): reserves memory locations (no initialization)
A DS 5⭐ - DC (Define Constant): allocates memory and initializes with constant value
B DC 10⭐ - USING: tells assembler which register to use as base register
USING *,15 - DROP: removes register from base register list
DROP 15 - ENTRY: declares symbol as entry point for linker
ENTRY MAIN -
EXTRN / EXTERNAL:declares symbol defined in another module EXTRN SUM ⭐
- CSECT: defines control section (separate relocatable unit)
MAIN CSECT - ORG: alternative form of ORIGIN (assembler dependent)
ORG 200
Subroutine vs Coroutine
Section titled “Subroutine vs Coroutine”Subroutine
-
A subroutine is a callable block of code== that ==executes and returns control to the calling point
-
Follows call–return== discipline (==stack based)
Example flow:
CALL SUB...SUB: ... RETKey points:
- One active subroutine at a time
- Uses stack for return address
- Common in procedural programming
Coroutine
-
A coroutine is a program unit== that ==suspends and resumes execution, not strict call–return
- Control is transferred cooperatively between routines
Example flow:
resume A → suspend A → resume B → suspend BKey points:
- Multiple active routines
- No implicit return to caller
- Used in concurrency, generators, schedulers
Core Difference (Exam Focus)
-
Subroutine: call → execute → return
-
Coroutine: resume ↔ suspend
Subroutine vs Coroutine (by Stack Overflow) ⭐
-
The subroutine is a special case of a co-routine. A co-routine is a generalized form of a subroutine which is non-preemptive multitasking.
-
A subroutine always starts its execution from the beginning(first line)==, but a ==co-routine starts from where it left off last time.
This is why we say, the co-routine has multiple entry points== whereas the ==sub-routine has only one.
Yield ‘remembers’ where the co-routine is so when it is called again it will continue where it left off.
For example:
coroutine foo { yield 1; yield 2; yield 3; } print foo(); print foo(); print foo();Prints: 1 2 3
Note: Coroutines may use a return, and behave just like a subroutine
coroutine foo { return 1; return 2; //Dead code return 3; } print foo(); print foo(); print foo();Prints: 1 1 1
Macro Processor ⭐
Section titled “Macro Processor ⭐”What is a Macro
-
A macro is a named block of assembly statements
- It is expanded by the assembler before actual code generation
-
Expansion = textual substitution, not execution
Key idea
Macro works at compile/assembly time, not run time
Why Macros are Needed
-
Avoid repetitive code
- Improve readability
- No CALL/RET overhead (unlike subroutines)
Use case
- Repeated instruction patterns
- Parameterized instruction blocks
Macro Processor
- A system software
- Runs before or inside assembler
- Replaces macro calls with macro body
Flow
Source Program→ Macro Processor (expansion)→ Pure Assembly Code→ Assembler→ Object CodeBasic Macro Structure ⭐
.MACRO MACRO_NAME parameter1, parameter2 statements.ENDMMeaning of each part:
.MACRO→ start macro definitionMACRO_NAME→ identifier of macroparameters→ placeholders.ENDM→ end of macro definition
Macro Call
MACRO_NAME actual1, actual2- Replaced by macro body
- Formal parameters substituted with actual values
Example Simple Macro
.MACRO INCR XADD X, =1.ENDMCall
INCR AExpansion
ADD A, =1Macro Variables (Parameters)
- Formal parameters → used in macro definition
- Actual parameters → passed during macro call
Example
.MACRO ADD2 A,BADD A,B.ENDMCall
ADD2 R1,R2Substitution
ADD R1,R2Macro Expansion vs Execution (⭐ GATE)
- Macro expansion → textual replacement
-
Instruction execution → CPU at run time
-
Macro processor does not evaluate logic, only expands text
Conditional Assembly
- Decisions taken during macro expansion
- Controlled by assembler directives
Used when:
- Macro behavior depends on parameter value
IF–ENDC Structure
.IF condition statements.ENDC.IF→ start conditional expansion.ENDC→ end conditional block
Relational Operators in Macro IF
EQ→ equal to zeroNE→ not equal to zeroGT→ greater than zeroLT→ less than zeroGE→ ≥ 0LE→ ≤ 0
⭐ Important rule
.IF EQ, X → if X == 0.IF NE, X → if X != 0Zero is implicit
WORD Directive
.WORD XMeaning:
-
Allocate 1 word of memory
-
Initialize it with value X
- Assembler directive, not machine instruction
Use case:
- Define constants
- Reserve initialized memory
ENDM
-
Marks end of macro definition
- Mandatory
- Assembler stops recording macro body here
ENDC
-
Marks end of conditional block
- Only used with
.IF
Recursive Macros
-
A macro that calls itself
- Can be direct or indirect
Example 1
.MACRO M1,X.IF EQ,XM1 X+1.ENDC.IF NE,X.WORD X.ENDC.ENDMStep-by-step understanding
- Input parameter:
X - Case 1:
X == 0- Macro calls itself with
X+1
- Macro calls itself with
- Case 2:
X != 0- Allocates one word with value
X
- Allocates one word with value
Pseudo logic
if (X == 0) call M1(X+1)if (X != 0) allocate word XWhy it terminates:
- First call: X = 0
- Second call: X = 1
- Condition
EQfails - Expansion stops
Example 2
.MACRO M2,X.IF EQ,XM2 X.ENDC.IF NE,X.WORD X+1.ENDC.ENDMPseudo logic
if (X == 0) call M2(X)if (X != 0) allocate word X+1Why infinite loop occurs:
- X never changes
X == 0always true- Macro keeps expanding forever
Recursive macro + unchanged argument ⇒ infinite loop
More For EXAM ⭐
Infinite Loop in Macro Processor Occurs when:
- Recursive macro exists
- Termination condition never becomes false
- Argument does not move toward exit condition
Macro vs Subroutine (Exam ⭐)
- Macro → expanded inline
- Subroutine → CALL and RET
- Macro → faster execution
- Subroutine → less code size
What Macro Processor Does NOT Do
- No execution
- No runtime decision
- No CPU involvement
Typical GATE Questions
- Identify infinite macro expansion
- Count number of WORD allocations
- Predict final expanded code
- Interpret
.IF EQ, X - Differentiate macro vs subroutine