Week 6 - Shell Scripting (Bash) • CT‑152

Terminal Lab

Try: pwd, ls, echo "echo Hello" > hello.sh, chmod u+x hello.sh, ./hello.sh, bash hello.sh

/bin/sh - student@ct152

student@ct152:~$

Practice Tasks

Create a script: Write hello.sh with at least one echo line.
Make it executable: Run chmod u+x hello.sh.
Simulated run: Execute ./hello.sh (simulated) or bash hello.sh.
Args: Create args.sh and print all args one per line (hint: loop over "$@").

Theoretical Foundation

Shell Execution Model

Understanding how the shell processes and executes scripts is crucial for writing robust code:

Process Creation: Each script execution creates a new process via fork() and exec() system calls
Environment Inheritance: Child processes inherit environment variables but cannot modify parent's environment
Exit Codes: Processes communicate success/failure through integer return values (0 = success, 1-255 = various errors)
Signal Handling: Scripts can intercept and respond to system signals using trap

Security & Safety Principles

Shell scripting presents unique security challenges due to its text-processing nature:

Input Validation: Always validate and sanitize user input to prevent injection attacks
Privilege Principle: Run scripts with minimal necessary permissions
Word Splitting: Unquoted variables can lead to unintended command execution
Path Safety: Use absolute paths or validate PATH to prevent command hijacking

Academic Note: These concepts connect to formal verification methods and secure coding practices in software engineering.

Core Concepts & Patterns

Safety Header & Defensive Programming

# Near top of most scripts
set -euo pipefail
IFS=$'\n\t'

Theory: Implements fail-fast principle from software engineering. -e ensures error propagation, -u catches undefined variable bugs (similar to strict mode in modern languages), -o pipefail ensures pipeline failures aren't masked.

IFS Reset: Prevents field splitting vulnerabilities by explicitly setting Internal Field Separator.

Variables & Lexical Scoping

name="Alice"
echo "Hello, $name"    # Variable expansion
echo 'Literal $name'   # No expansion
readonly CONFIG_FILE="/etc/app.conf"

Theory: Bash uses dynamic scoping by default. local keyword in functions creates lexical scope. Quoting rules follow formal language theory—double quotes allow expansion, single quotes create string literals.

Finite State Automaton: `getopts`

while getopts ":i:o:n" opt; do
  case $opt in
    i) infile=$OPTARG ;;
    o) outfile=$OPTARG ;;
    n) dryrun=1 ;;
    :) echo "Missing arg for -$OPTARG" >&2; exit 2 ;;
    \?) echo "Unknown -$OPTARG" >&2; exit 2 ;;
  esac
done
shift $((OPTIND-1))

Theory: Implements a finite state machine for parsing command-line options. Each state transition is deterministic, following formal language parsing principles.

Control Flow & Computational Thinking

# Iteration with safe variable handling
for arg in "$@"; do 
  echo "Processing: $arg"; 
done

# Pattern matching (similar to switch statements)
case "$extension" in
  csv|tsv) echo "Tabular data" ;;
  json|xml) echo "Structured data" ;;
  *) echo "Unknown format" ;;
esac

Theory: Demonstrates algorithmic thinking patterns. The quoted "$@" preserves argument boundaries, preventing inadvertent data corruption.

Debugging & Error Handling Strategies

Debugging Techniques

Trace Execution

# Enable debugging
bash -x script.sh

# Or within script
set -x    # Enable trace
command1
command2
set +x    # Disable trace

Theory: Implements execution tracing, similar to debuggers in compiled languages. Shows command expansion and execution flow.

Validation Patterns

# Check file readability
[[ -r "$input_file" ]] || {
  echo "Error: Cannot read $input_file" >&2
  exit 1
}

# Validate numeric input
if ! [[ "$count" =~ ^[0-9]+$ ]]; then
  echo "Error: Invalid number '$count'" >&2
  exit 2
fi

Theory: Implements precondition checking from formal verification. Early validation prevents downstream errors.

Error Recovery & Cleanup

Trap Handlers

# Setup cleanup on script exit
tmp_file="/tmp/script_$$"
trap 'rm -f "$tmp_file"' EXIT

# Handle interruption gracefully  
trap 'echo "Script interrupted" >&2; exit 130' INT TERM

Theory: Implements exception handling patterns. trap ensures resource cleanup regardless of exit path.

Exit Code Conventions

# Standard exit codes
exit 0   # Success
exit 1   # General error
exit 2   # Misuse (bad arguments)
exit 126 # Command not executable
exit 127 # Command not found
exit 130 # Script terminated by Ctrl+C

Theory: Follows POSIX standards for process communication. Enables programmatic error handling in calling scripts.

Guided Lab: Log Analysis System

Engineering Challenge

Problem: Design a robust log analysis tool that processes authentication failures, demonstrating systems programming concepts including file I/O, text processing algorithms, and error handling.

Real-world Context: Security teams use similar tools to detect brute-force attacks and identify compromised accounts in enterprise environments.

Requirements Analysis

Functional Requirements:

Parse authentication logs using regex patterns
Aggregate failure counts by username
Support multiple output formats (console, CSV)
Handle command-line flags with validation
Implement dry-run mode for testing

Non-functional Requirements:

Robust error handling and validation
POSIX compliance for portability
Memory-efficient processing for large logs
Clear user feedback and error messages

Algorithm Design

Pipeline Architecture:

Input Validation: File existence and readability
Pattern Matching: Extract failed login attempts
Data Extraction: Parse usernames from log entries
Aggregation: Count failures per user
Sorting: Rank by failure frequency
Output Formatting: Display or export results

Time Complexity: O(n log n) due to sorting step

Space Complexity: O(k) where k = unique usernames

Implementation with Academic Commentary

#!/usr/bin/env bash
# Authentication Log Analyzer
# Demonstrates: Systems programming, text processing, error handling

set -euo pipefail              # Fail-fast error handling
IFS=$'\n\t'                   # Prevent word splitting vulnerabilities

# Default configuration (following convention over configuration principle)
infile="auth.log"
outfile=""
dryrun=0

# Command-line argument parsing using finite state automaton
while getopts ":i:o:n" opt; do
  case $opt in
    i) infile=$OPTARG ;;       # Input file specification
    o) outfile=$OPTARG ;;      # Output file specification  
    n) dryrun=1 ;;             # Dry-run mode (testing pattern)
    :) echo "Option -$OPTARG requires an argument" >&2; exit 2 ;;
    \?) echo "Unknown option: -$OPTARG" >&2; exit 2 ;;
  esac
done
shift $((OPTIND-1))            # Remove processed options from $@

# Input validation (precondition checking)
[[ -r "$infile" ]] || { 
  echo "Error: Cannot read input file '$infile'" >&2
  exit 1
}

# Core algorithm: Text processing pipeline
# Uses Unix philosophy: chain small, focused utilities
summary=$(
  grep -E "Failed password" "$infile" \              # Pattern matching
  | awk '{                                           # Field extraction
      for(i=1; i<=NF; i++) 
        if($i=="for") {
          print $(i+1); 
          break
        }
    }' \
  | sort \                                           # Lexicographic ordering
  | uniq -c \                                        # Frequency counting
  | sort -nr \                                       # Numeric sort (descending)
  | head -10                                         # Top-k selection
)

# Output handling with format abstraction
if [[ ${outfile:-} ]]; then
  # CSV output (structured data format)
  {
    echo "username,count"                            # Header row
    awk '{print $2","$1}' <<<"$summary"            # Data transformation
  } > "$outfile"
  
  # User feedback with dry-run support
  if [[ $dryrun -eq 1 ]]; then
    echo "Would write CSV to '$outfile'" >&2
  else
    echo "Successfully wrote CSV to '$outfile'" >&2
  fi
else
  # Console output (human-readable format)
  echo "$summary"
fi

Academic Analysis Points

Algorithmic Efficiency: The pipeline processes data in a single pass where possible, minimizing memory usage
Separation of Concerns: Input validation, processing, and output are clearly separated
Error Propagation: Each command in the pipeline can fail independently, with errors properly propagated
Abstraction Layers: Command-line interface abstracts the underlying processing complexity

Interactive Command Construction

ls -l -a | grep .sh > out.txt chmod u+x hello.sh bash

Command Area

Drop command components here to build your command...

Command Output

(command output will appear here)

Knowledge Check

Academic Focus: These questions test both practical knowledge and theoretical understanding of shell scripting concepts.

1) Systems Programming Concepts: What does set -euo pipefail implement in terms of software engineering principles?
2) Formal Language Theory: In terms of lexical analysis, which quoting method preserves literal tokens without variable expansion?
3) Process Communication: Which variable provides the argument count for inter-process communication via command line?
4) Finite State Automaton: For getopts to recognize an option that requires an argument, what specification implements the state transition?
5) Memory Safety & Data Integrity: Which iteration pattern safely preserves argument boundaries and prevents word splitting vulnerabilities?
6) Algorithm Analysis: Given a log file with n entries and k unique usernames, what is the time complexity of the authentication failure analysis pipeline?
7) Security Principles: Why is input validation crucial in shell scripts from a cybersecurity perspective?
8) Software Engineering: What design pattern does the trap mechanism implement for resource management?

Assessment Results

Career Applications

Real-World Applications

DevOps Automation: CI/CD pipelines, deployment scripts, and infrastructure management
System Administration: Server maintenance, user management, and system monitoring
Data Processing: ETL workflows, log parsing, and batch data transformation
Security Operations: Log analysis, incident response, and security monitoring

Advanced Applications & Research Directions

Systems Administration: Infrastructure as Code (IaC) with configuration management
DevOps Integration: CI/CD pipelines and automated deployment scripts
Data Engineering: ETL processing for large-scale data transformation
Security Research: Log analysis and intrusion detection systems
Performance Analysis: System monitoring and resource optimization

Career Connections

Site Reliability Engineer: Automation, monitoring, and incident response
Cloud Infrastructure Engineer: Infrastructure as Code and deployment automation
Data Engineer: ETL pipelines and data processing workflows
Cybersecurity Analyst: Log analysis and security automation

Portfolio Value: Shell scripting demonstrates automation expertise and systems thinking valued across technical roles.

Extension Challenges

Advanced Projects

Log Analytics Engine: Extend authsum.sh with time-series analysis and anomaly detection
Data Pipeline: Build csv_clean.sh for ETL operations (validation, normalization, deduplication)
Monitoring System: Create real-time log monitoring with alerting mechanisms
Security Audit: Implement shellcheck integration and security vulnerability scanning

Summary & Reference

Key Takeaways & Theory Connections

Process Model: Shebang + execute bit demonstrate Unix process creation via exec() family
Formal Languages: Variable quoting implements lexical scoping and prevents injection vulnerabilities
State Machines: getopts implements finite automaton for command-line parsing
Functional Programming: Unix pipelines demonstrate function composition and data flow
Software Engineering: trap & cleanup patterns implement resource management and exception safety
Algorithm Design: Text processing demonstrates ETL patterns and data transformation

Cross-Curricular Connections

Academic Links

Data Structures: Hash tables in associative arrays, sorting algorithms
Operating Systems: Process management, file systems, inter-process communication
Software Engineering: Testing strategies, code quality, documentation standards
Cybersecurity: Input validation, privilege escalation, attack surface analysis
Formal Methods: Specification languages, verification techniques

Self-Assessment Questions:

How do the shell scripting concepts relate to other programming paradigms you've learned?
What security implications arise from shell script usage in production environments?
How might you apply these text processing techniques to other domains (data science, web development)?
What trade-offs exist between shell scripts and higher-level programming languages for automation tasks?