Skip to content

Compiler, Lexical and Syntax Analyzer

  1. Frontend focuses on analysis of the source code (language-specific) to check correctness and produce an Intermediate Representation (IR) independent of hardware.
  2. Backend focuses on synthesis – converting IR into optimized machine code for a specific target architecture.

1. Frontend

Responsible for analyzing the source code (syntax + semantics)

  • Lexical Analysis → Converts code into tokens → Done by Lexer / Scanner
  • Syntax Analysis → Checks grammar (CFG) → Builds Parse Tree / Syntax Tree → Done by Parser
  • Semantic Analysis → Checks for semantic correctness (type checks, undeclared vars) → Uses Symbol Table
  • Intermediate Code Generation (ICG) → Converts code to intermediate representation (IR) like Three Address Code (TAC)

2. Backend

Responsible for optimizing and converting IR to target code

  • Code Optimization → Improves IR for performance (speed, memory) → e.g., Constant Folding, Dead Code Elimination
  • Code Generation → Converts IR to Machine Code / Assembly
  • Code Linking & Assembly ❌ → Combines code with libraries and prepares executable
PhasePartTask
Lexical AnalysisFrontendTokenization
Syntax AnalysisFrontendParse Tree creation
Semantic AnalysisFrontendType checking, Symbol Table
Intermediate Code GenFrontendGenerate IR
OptimizationBackendImprove IR
Code GenerationBackendGenerate Machine Code
Linking & Assembly ❌BackendFinal Executable

Error Detection and Code Generation

First Three Phases – Used for: Error Detection

  1. Lexical Analysis → Detects lexical errors (invalid tokens, unknown symbols)
  2. Syntax Analysis → Detects syntax errors (missing brackets, wrong structure)
  3. Semantic Analysis → Detects semantic errors (type mismatches, undeclared variables)

Remaining Phases – Used for: Code Generation & Optimization 4. Intermediate Code Generation – Platform-independent code 5. Code Optimization – Improves performance 6. Code Generation – Machine code 7. Linking & Assembly – Final executable ❌

In most compiler design syllabi (e.g., GATE, SJVN, etc.), the last phase Linking & Assembly is not explicitly included.


Keyword vs Identifier in C++

1. Keywords (Predefined Words)

  • Reserved by the language.
  • Cannot be used as names.
  • Examples: int, if, else, while, return

2. Identifiers

  • a. Predefined Identifiers (from libraries)
    • Provided by the C++ Standard Library.
    • Not keywords but act as names for predefined objects/functions.
    • Examples:
      • Object - cin, cout
      • Function - sort(), strlen()
      • Manipulator - endl, setw()
  • b. User-Defined Identifiers
    • Created by the programmer.
    • Used for variables, functions, classes, etc.
    • Example:
      • Variable - sum, Student
      • Function - main(), calculate()

Non-Terminal vs Terminal – Real C++ Example

**In the statement:

Terminal window
int a = b + 5;
Non-TerminalReplaced By (Terminal / Expression)
DeclStmtint a = b + 5 ;
Typeint
Identifiera, b
Exprb + 5
Number5
Operator+
  • Non-terminals are grammar symbols (e.g., Expr, DeclStmt)
  • Terminals are actual C++ tokens (e.g., int, a, +, 5, ;)
    Non-terminals get replaced by terminals and other non-terminals during parsing.

1. Top-Down Parsers

  • Start from the start symbol and try to derive the input string.
  • Work by expanding non-terminals.
  • Follow Left-to-Right scan and Leftmost derivation.
Top-Down Parser TypeDescription
LL ParsersLeft-to-right scan, Leftmost derivation

LL Parsers Subtypes

TypeFull Form / Explanation
LL(1)Left-to-right scan, Leftmost derivation with 1 lookahead symbol
LL(k)Left-to-right scan, Leftmost derivation with k lookahead symbols
Recursive DescentManually written parser, uses recursive functions for each non-terminal (LL(k))
Predictive ParserNon-recursive LL(1) parser using a parsing table (no backtracking)

2. Bottom-Up Parsers

  • Start from the input string and try to reduce it to the start symbol.
  • Work by shifting and reducing symbols on a stack.
  • Follow Left-to-Right scan and Rightmost derivation in reverse.
Bottom-Up Parser TypeDescription
1. Shift-Reduce ParserBasic bottom-up parser using stack + actions
2. Operator Precedence ParserUses precedence relations to guide parsing
3. LR ParsersMost powerful deterministic parsers

LR Parsers Subtypes

TypeFull Form / Explanation
SLRSimple LR – based on FOLLOW sets
LALRLook-Ahead LR – optimized for size, same states as SLR
CLRCanonical LR / LR(1) – uses full lookahead, largest

LL ParsersTop-Down, Left-to-right scan + Leftmost derivation
LR ParsersBottom-Up, Left-to-right scan + Rightmost derivation (in reverse)

The value inside parentheses (k) denotes number of lookahead symbols used while making parsing decisions.

LL(k) Parser

  • L → Scans input from Left to right
  • L → Produces Leftmost derivation
  • (k) → Uses k lookahead symbols
  • Common: LL(1) → Predictive or Recursive Descent Parser

LR(0) Parser

  • L → Scans input from Left to right
  • R → Produces Rightmost derivation in reverse
  • (0) → Uses 0 lookahead symbols
  • Simplest LR parser but least powerful

SLR(1) Parser (Simple LR)

  • L → Scans input from Left to right
  • R → Produces Rightmost derivation in reverse
  • (1) → Uses 1 lookahead symbol
  • Based on LR(0) items + FOLLOW sets for reduce decisions
  • Easier to construct but less powerful

LALR(1) Parser (Look-Ahead LR)

  • LA → Look-Ahead
  • L → Scans input from Left to right
  • R → Produces Rightmost derivation in reverse
  • (1) → Uses 1 lookahead symbol
  • Combines states of LR(1) having same core
  • Smaller table than LR(1), same power for most grammars
  • Used in YACC, Bison

Canonical LR(1) Parser

  • Canonical → full, unmerged, standard form with all LR(1) items distinct.
  • L → Scans input from Left to right
  • R → Produces Rightmost derivation in reverse
  • (1) → Uses 1 lookahead symbol
  • Most general and powerful LR parser
  • Builds separate states for each unique lookahead
  • Large table size, but handles all deterministic context-free grammars
ParserDerivationParsing TypeLookaheadPowerRemarks
LL(1)Leftmost==Top-down==1MediumPredictive parser
LR(0)RightmostBottom-up==0====Weak==Simple, less practical
SLR(1)RightmostBottom-up1MediumUses FOLLOW sets
LALR(1)RightmostBottom-up1==Strong==Efficient & widely used
Canonical LR(1)RightmostBottom-up1==Very Strong==Large tables