Shell Scripting (Bash)

CT-152
Introduction to UNIX
Shell Scripting
Bash
Overview

Master the art of shell scripting by learning to write, execute, and secure robust Bash automation scripts. You'll explore essential techniques including parameter handling, defensive quoting strategies, control flow patterns, and powerful text processing pipelines. Apply your skills by building a practical log analyzer tool, then demonstrate your mastery through a comprehensive knowledge assessment.

Module Facts

Format: Interactive module (single HTML file)

Assessment: Guided lab + quiz + optional challenge

Deliverables: authsum.sh (+ optional csv_clean.sh)

Core Competencies
  • Create & Execute: Design executable Bash scripts using proper shebang syntax and file permissions, demonstrating understanding of Unix process execution model.
  • Variable Management: Implement secure variable handling with proper quoting to prevent word splitting and glob expansion in at least 3 different contexts.
  • Parameter Processing: Process command-line arguments using $@, $#, and getopts while maintaining POSIX compliance.
  • Control Structures: Apply conditional logic and iteration patterns to solve algorithmic problems efficiently.
  • Text Processing: Construct Unix pipelines combining multiple utilities to perform ETL operations on structured data.
  • Error Handling: Implement defensive programming techniques using exit codes, set -euo pipefail, and trap for robust script behavior.

Terminal Lab

Try: pwd, ls, echo "echo Hello" > hello.sh, chmod u+x hello.sh, ./hello.sh, bash hello.sh

/bin/sh - student@ct152
student@ct152:~$

Practice Tasks

  1. Create a script: Write hello.sh with at least one echo line.
  2. Make it executable: Run chmod u+x hello.sh.
  3. Simulated run: Execute ./hello.sh (simulated) or bash hello.sh.
  4. Args: Create args.sh and print all args one per line (hint: loop over "$@").

Theoretical Foundation

Shell Execution Model

Understanding how the shell processes and executes scripts is crucial for writing robust code:

  • Process Creation: Each script execution creates a new process via fork() and exec() system calls
  • Environment Inheritance: Child processes inherit environment variables but cannot modify parent's environment
  • Exit Codes: Processes communicate success/failure through integer return values (0 = success, 1-255 = various errors)
  • Signal Handling: Scripts can intercept and respond to system signals using trap

Security & Safety Principles

Shell scripting presents unique security challenges due to its text-processing nature:

  • Input Validation: Always validate and sanitize user input to prevent injection attacks
  • Privilege Principle: Run scripts with minimal necessary permissions
  • Word Splitting: Unquoted variables can lead to unintended command execution
  • Path Safety: Use absolute paths or validate PATH to prevent command hijacking
Academic Note: These concepts connect to formal verification methods and secure coding practices in software engineering.

Core Concepts & Patterns

Safety Header & Defensive Programming

# Near top of most scripts
set -euo pipefail
IFS=$'\n\t'

Theory: Implements fail-fast principle from software engineering. -e ensures error propagation, -u catches undefined variable bugs (similar to strict mode in modern languages), -o pipefail ensures pipeline failures aren't masked.

IFS Reset: Prevents field splitting vulnerabilities by explicitly setting Internal Field Separator.

Variables & Lexical Scoping

name="Alice"
echo "Hello, $name"    # Variable expansion
echo 'Literal $name'   # No expansion
readonly CONFIG_FILE="/etc/app.conf"

Theory: Bash uses dynamic scoping by default. local keyword in functions creates lexical scope. Quoting rules follow formal language theory—double quotes allow expansion, single quotes create string literals.

Finite State Automaton: getopts

while getopts ":i:o:n" opt; do
  case $opt in
    i) infile=$OPTARG ;;
    o) outfile=$OPTARG ;;
    n) dryrun=1 ;;
    :) echo "Missing arg for -$OPTARG" >&2; exit 2 ;;
    \?) echo "Unknown -$OPTARG" >&2; exit 2 ;;
  esac
done
shift $((OPTIND-1))

Theory: Implements a finite state machine for parsing command-line options. Each state transition is deterministic, following formal language parsing principles.

Control Flow & Computational Thinking

# Iteration with safe variable handling
for arg in "$@"; do 
  echo "Processing: $arg"; 
done

# Pattern matching (similar to switch statements)
case "$extension" in
  csv|tsv) echo "Tabular data" ;;
  json|xml) echo "Structured data" ;;
  *) echo "Unknown format" ;;
esac

Theory: Demonstrates algorithmic thinking patterns. The quoted "$@" preserves argument boundaries, preventing inadvertent data corruption.

Debugging & Error Handling Strategies

Debugging Techniques

Trace Execution

# Enable debugging
bash -x script.sh

# Or within script
set -x    # Enable trace
command1
command2
set +x    # Disable trace

Theory: Implements execution tracing, similar to debuggers in compiled languages. Shows command expansion and execution flow.

Validation Patterns

# Check file readability
[[ -r "$input_file" ]] || {
  echo "Error: Cannot read $input_file" >&2
  exit 1
}

# Validate numeric input
if ! [[ "$count" =~ ^[0-9]+$ ]]; then
  echo "Error: Invalid number '$count'" >&2
  exit 2
fi

Theory: Implements precondition checking from formal verification. Early validation prevents downstream errors.

Error Recovery & Cleanup

Trap Handlers

# Setup cleanup on script exit
tmp_file="/tmp/script_$$"
trap 'rm -f "$tmp_file"' EXIT

# Handle interruption gracefully  
trap 'echo "Script interrupted" >&2; exit 130' INT TERM

Theory: Implements exception handling patterns. trap ensures resource cleanup regardless of exit path.

Exit Code Conventions

# Standard exit codes
exit 0   # Success
exit 1   # General error
exit 2   # Misuse (bad arguments)
exit 126 # Command not executable
exit 127 # Command not found
exit 130 # Script terminated by Ctrl+C

Theory: Follows POSIX standards for process communication. Enables programmatic error handling in calling scripts.

Guided Lab: Log Analysis System

Engineering Challenge

Problem: Design a robust log analysis tool that processes authentication failures, demonstrating systems programming concepts including file I/O, text processing algorithms, and error handling.

Real-world Context: Security teams use similar tools to detect brute-force attacks and identify compromised accounts in enterprise environments.

Requirements Analysis

Functional Requirements:
  • Parse authentication logs using regex patterns
  • Aggregate failure counts by username
  • Support multiple output formats (console, CSV)
  • Handle command-line flags with validation
  • Implement dry-run mode for testing
Non-functional Requirements:
  • Robust error handling and validation
  • POSIX compliance for portability
  • Memory-efficient processing for large logs
  • Clear user feedback and error messages

Algorithm Design

Pipeline Architecture:

  1. Input Validation: File existence and readability
  2. Pattern Matching: Extract failed login attempts
  3. Data Extraction: Parse usernames from log entries
  4. Aggregation: Count failures per user
  5. Sorting: Rank by failure frequency
  6. Output Formatting: Display or export results

Time Complexity: O(n log n) due to sorting step

Space Complexity: O(k) where k = unique usernames

Implementation with Academic Commentary

#!/usr/bin/env bash
# Authentication Log Analyzer
# Demonstrates: Systems programming, text processing, error handling

set -euo pipefail              # Fail-fast error handling
IFS=$'\n\t'                   # Prevent word splitting vulnerabilities

# Default configuration (following convention over configuration principle)
infile="auth.log"
outfile=""
dryrun=0

# Command-line argument parsing using finite state automaton
while getopts ":i:o:n" opt; do
  case $opt in
    i) infile=$OPTARG ;;       # Input file specification
    o) outfile=$OPTARG ;;      # Output file specification  
    n) dryrun=1 ;;             # Dry-run mode (testing pattern)
    :) echo "Option -$OPTARG requires an argument" >&2; exit 2 ;;
    \?) echo "Unknown option: -$OPTARG" >&2; exit 2 ;;
  esac
done
shift $((OPTIND-1))            # Remove processed options from $@

# Input validation (precondition checking)
[[ -r "$infile" ]] || { 
  echo "Error: Cannot read input file '$infile'" >&2
  exit 1
}

# Core algorithm: Text processing pipeline
# Uses Unix philosophy: chain small, focused utilities
summary=$(
  grep -E "Failed password" "$infile" \              # Pattern matching
  | awk '{                                           # Field extraction
      for(i=1; i<=NF; i++) 
        if($i=="for") {
          print $(i+1); 
          break
        }
    }' \
  | sort \                                           # Lexicographic ordering
  | uniq -c \                                        # Frequency counting
  | sort -nr \                                       # Numeric sort (descending)
  | head -10                                         # Top-k selection
)

# Output handling with format abstraction
if [[ ${outfile:-} ]]; then
  # CSV output (structured data format)
  {
    echo "username,count"                            # Header row
    awk '{print $2","$1}' <<<"$summary"            # Data transformation
  } > "$outfile"
  
  # User feedback with dry-run support
  if [[ $dryrun -eq 1 ]]; then
    echo "Would write CSV to '$outfile'" >&2
  else
    echo "Successfully wrote CSV to '$outfile'" >&2
  fi
else
  # Console output (human-readable format)
  echo "$summary"
fi

Academic Analysis Points

  • Algorithmic Efficiency: The pipeline processes data in a single pass where possible, minimizing memory usage
  • Separation of Concerns: Input validation, processing, and output are clearly separated
  • Error Propagation: Each command in the pipeline can fail independently, with errors properly propagated
  • Abstraction Layers: Command-line interface abstracts the underlying processing complexity

Interactive Command Construction

ls -l -a | grep .sh > out.txt chmod u+x hello.sh bash

Command Area

Drop command components here to build your command...

Command Output

(command output will appear here)

Knowledge Check

Academic Focus: These questions test both practical knowledge and theoretical understanding of shell scripting concepts.
  1. 1) Systems Programming Concepts: What does set -euo pipefail implement in terms of software engineering principles?
  2. 2) Formal Language Theory: In terms of lexical analysis, which quoting method preserves literal tokens without variable expansion?
  3. 3) Process Communication: Which variable provides the argument count for inter-process communication via command line?
  4. 4) Finite State Automaton: For getopts to recognize an option that requires an argument, what specification implements the state transition?
  5. 5) Memory Safety & Data Integrity: Which iteration pattern safely preserves argument boundaries and prevents word splitting vulnerabilities?
  6. 6) Algorithm Analysis: Given a log file with n entries and k unique usernames, what is the time complexity of the authentication failure analysis pipeline?
  7. 7) Security Principles: Why is input validation crucial in shell scripts from a cybersecurity perspective?
  8. 8) Software Engineering: What design pattern does the trap mechanism implement for resource management?

Assessment Results

Career Applications

Real-World Applications

  • DevOps Automation: CI/CD pipelines, deployment scripts, and infrastructure management
  • System Administration: Server maintenance, user management, and system monitoring
  • Data Processing: ETL workflows, log parsing, and batch data transformation
  • Security Operations: Log analysis, incident response, and security monitoring

Advanced Applications & Research Directions

  • Systems Administration: Infrastructure as Code (IaC) with configuration management
  • DevOps Integration: CI/CD pipelines and automated deployment scripts
  • Data Engineering: ETL processing for large-scale data transformation
  • Security Research: Log analysis and intrusion detection systems
  • Performance Analysis: System monitoring and resource optimization

Career Connections

  • Site Reliability Engineer: Automation, monitoring, and incident response
  • Cloud Infrastructure Engineer: Infrastructure as Code and deployment automation
  • Data Engineer: ETL pipelines and data processing workflows
  • Cybersecurity Analyst: Log analysis and security automation

Portfolio Value: Shell scripting demonstrates automation expertise and systems thinking valued across technical roles.

Extension Challenges

Advanced Projects

  • Log Analytics Engine: Extend authsum.sh with time-series analysis and anomaly detection
  • Data Pipeline: Build csv_clean.sh for ETL operations (validation, normalization, deduplication)
  • Monitoring System: Create real-time log monitoring with alerting mechanisms
  • Security Audit: Implement shellcheck integration and security vulnerability scanning

Summary & Reference

Key Takeaways & Theory Connections

  • Process Model: Shebang + execute bit demonstrate Unix process creation via exec() family
  • Formal Languages: Variable quoting implements lexical scoping and prevents injection vulnerabilities
  • State Machines: getopts implements finite automaton for command-line parsing
  • Functional Programming: Unix pipelines demonstrate function composition and data flow
  • Software Engineering: trap & cleanup patterns implement resource management and exception safety
  • Algorithm Design: Text processing demonstrates ETL patterns and data transformation

Cross-Curricular Connections

Academic Links

  • Data Structures: Hash tables in associative arrays, sorting algorithms
  • Operating Systems: Process management, file systems, inter-process communication
  • Software Engineering: Testing strategies, code quality, documentation standards
  • Cybersecurity: Input validation, privilege escalation, attack surface analysis
  • Formal Methods: Specification languages, verification techniques

Self-Assessment Questions:

  • How do the shell scripting concepts relate to other programming paradigms you've learned?
  • What security implications arise from shell script usage in production environments?
  • How might you apply these text processing techniques to other domains (data science, web development)?
  • What trade-offs exist between shell scripts and higher-level programming languages for automation tasks?