Pipes and Redirection

The Unix Philosophy: Small Tools, Big Results

Pipes (|) connect the output of one command to the input of another. Redirection (>, >>, <) connects commands to files. Together, they let you build powerful data processing pipelines from simple building blocks.

This is the Unix philosophy: each command does one thing well, and pipes combine them into workflows that no single command could accomplish alone.

Pipes

The pipe operator | takes the standard output (stdout) of the left command and passes it as standard input (stdin) to the right command.

Basic Pipe Examples

# Count how many POST requests returned 500
grep "POST /api" /var/log/nginx/access.log | grep "500" | wc -l

# Find the 10 most frequent error messages
grep "ERROR" app.log | sort | uniq -c | sort -rn | head -10

# Show unique IP addresses accessing your server
awk '{print $1}' /var/log/nginx/access.log | sort -u | wc -l

# Find the largest files in a directory
du -sh * | sort -rh | head -5

Breaking Down a Complex Pipe

Let us trace through the "10 most frequent error messages" example step by step:

# Step 1: Get all ERROR lines
grep "ERROR" app.log
# Output: hundreds of error lines

# Step 2: Sort them (required for uniq)
grep "ERROR" app.log | sort
# Output: error lines in alphabetical order

# Step 3: Count consecutive duplicates
grep "ERROR" app.log | sort | uniq -c
# Output:
#   47 ERROR: Connection timeout to payment-service
#   23 ERROR: Invalid session token
#   12 ERROR: Database query exceeded timeout

# Step 4: Sort by count (descending, numeric)
grep "ERROR" app.log | sort | uniq -c | sort -rn
# Output: Most frequent errors first

# Step 5: Show only the top 10
grep "ERROR" app.log | sort | uniq -c | sort -rn | head -10
# Output: The 10 most frequent error messages with counts

Redirection

Redirection connects command input and output to files instead of the terminal.

Output Redirection

# Write output to a file (overwrite)
curl -s https://api.example.com/users > users.json

# Append output to a file
echo "Test run completed at $(date)" >> test-log.txt

# Write to a file and display on screen simultaneously
curl -s https://api.example.com/users | tee users.json

# Append with tee
echo "Another result" | tee -a test-log.txt

Error Redirection

Linux has three standard streams:

stdin (0): Standard input
stdout (1): Standard output (normal output)
stderr (2): Standard error (error messages)

# Redirect only errors to a file
npm run test 2> errors.log

# Redirect both stdout and stderr to the same file
npm run test > output.log 2>&1

# Redirect stdout and stderr to separate files
npm run test > output.log 2> errors.log

# Discard all output (useful for scripts where you only care about the exit code)
curl -s https://api.example.com/health > /dev/null 2>&1

# Discard errors only (suppress "Permission denied" noise from find)
find / -name "config.json" 2>/dev/null

Input Redirection

# Read input from a file
sort < unsorted-list.txt

# Here document (inline input)
cat <<EOF > test-config.json
{
  "baseURL": "https://staging.example.com",
  "timeout": 30000
}
EOF

# Here string (single line input)
grep "error" <<< "This has an error in it"

Essential Pipe Commands

These commands are most useful when combined with pipes:

sort

# Sort alphabetically
sort names.txt

# Sort numerically
sort -n numbers.txt

# Sort in reverse
sort -r names.txt

# Sort by specific column (e.g., 3rd column)
sort -t',' -k3 data.csv

# Sort numerically, reverse (most common in pipe chains)
sort -rn

uniq

uniq removes consecutive duplicate lines. Always sort before uniq.

# Remove duplicates
sort names.txt | uniq

# Count occurrences
sort names.txt | uniq -c

# Show only duplicates
sort names.txt | uniq -d

# Show only unique lines (no duplicates)
sort names.txt | uniq -u

wc (Word Count)

# Count lines
wc -l file.txt

# Count words
wc -w file.txt

# Count characters
wc -c file.txt

# Count lines from pipe (most common usage)
grep "ERROR" app.log | wc -l

head and tail

# First 10 lines (default)
head file.txt

# First 20 lines
head -20 file.txt

# Last 10 lines (default)
tail file.txt

# Last 50 lines
tail -50 file.txt

# Follow a file in real-time (live log monitoring)
tail -f /var/log/app/application.log

# Follow and show last 100 lines
tail -100f /var/log/app/application.log

cut

# Extract specific columns from delimited data
cut -d',' -f1,3 data.csv     # Fields 1 and 3, comma-delimited
cut -d':' -f1 /etc/passwd    # Username from passwd file
cut -c1-10 file.txt           # First 10 characters of each line

tr (Translate)

# Convert to uppercase
echo "hello" | tr 'a-z' 'A-Z'
# Output: HELLO

# Replace spaces with newlines
echo "one two three" | tr ' ' '\n'

# Delete specific characters
echo "Hello, World!" | tr -d '!,'
# Output: Hello World

# Squeeze repeated characters
echo "hello    world" | tr -s ' '
# Output: hello world

xargs

# Pass pipe output as arguments to a command
find . -name "*.spec.ts" | xargs grep "test.only"

# Delete files found by find
find /tmp -name "*.log" -mtime +7 | xargs rm

# Run a command for each line of input
cat urls.txt | xargs -I {} curl -s -o /dev/null -w "{}: %{http_code}\n" {}

Practical QA Pipe Chains

Analyzing Nginx Access Logs

# Top 10 most requested URLs
awk '{print $7}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -10

# Top 10 IP addresses by request count
awk '{print $1}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -10

# Count requests per HTTP status code
awk '{print $9}' /var/log/nginx/access.log | sort | uniq -c | sort -rn

# All 500 errors with timestamps
grep " 500 " /var/log/nginx/access.log | awk '{print $4, $7}' | tail -20

Analyzing Test Results

# Count passed/failed/skipped from JUnit XML
grep -c 'status="passed"' test-results/*.xml
grep -c 'status="failed"' test-results/*.xml

# Find all test files that contain .only (accidentally focused tests)
grep -rl "\.only" tests/ --include="*.spec.ts"

# List all test descriptions
grep -rh "test\('" tests/ --include="*.spec.ts" | sed "s/.*test('//" | sed "s/',.*//" | sort

Monitoring and Health Checks

# Check multiple endpoints and show status
for url in https://api.example.com/health https://web.example.com https://admin.example.com; do
  STATUS=$(curl -s -o /dev/null -w "%{http_code}" --max-time 10 "$url")
  echo "$url: $STATUS"
done

# Monitor a log file for errors and send an alert
tail -f /var/log/app/application.log | grep --line-buffered "CRITICAL" | while read line; do
  echo "ALERT: $line" | mail -s "Critical Error" qa-team@example.com
done

Process Substitution

Process substitution lets you use a command's output as if it were a file:

# Compare the output of two commands
diff <(curl -s https://staging.example.com/api/config) <(curl -s https://prod.example.com/api/config)

# Compare sorted output
diff <(sort file1.txt) <(sort file2.txt)

Hands-On Exercise

Use a pipe chain to find the 5 most common words in a text file
Redirect both stdout and stderr of a test run to separate files
Use tail -f to monitor a log file while running tests in another terminal
Build a pipe that analyzes an access log: count requests by status code, showing the top 5
Use tee to save curl output to a file while also piping it through jq
Write a one-liner that finds all .spec.ts files containing test.only and lists them with line numbers