PII Redaction in Logs
PII Redaction in Logs
Section titled “PII Redaction in Logs”Problem
Section titled “Problem”You need to redact personally identifiable information (PII) from logs to comply with privacy regulations (GDPR, CCPA) while maintaining useful logging for debugging and monitoring. This is a critical challenge for AI systems that process user data: logs are essential for debugging production issues, but exposing PII in logs creates compliance and security risks. Privacy regulations require minimizing PII exposure, but overly aggressive redaction makes logs useless for debugging. The challenge is striking the right balance: redact sensitive data (emails, phone numbers, SSNs, credit cards, IP addresses) while preserving enough context to diagnose issues. You need to handle both structured logs (JSON fields) and unstructured text (error messages, user inputs) where PII might appear anywhere. Additionally, redaction must be automatic and fail-safe—relying on developers to manually redact PII is error-prone and doesn’t scale.
Solution
Section titled “Solution”Implement a PII redactor that uses regex patterns to detect PII (emails, phone numbers, SSNs, credit cards), replaces them with redacted placeholders, and maintains a redaction audit trail. This works because PII follows predictable patterns that can be detected and replaced. The design uses regex patterns for common PII types, configurable redaction strategies (full redaction vs. partial masking), and an audit trail that logs what was redacted without logging the actual PII. The key insight is that complete removal of PII is often unnecessary and counterproductive—partial redaction (like showing the domain for emails or last 4 digits for credit cards) preserves debugging utility while protecting sensitive data. The redactor integrates with Beluga’s logging infrastructure and OpenTelemetry tracing, ensuring all logs are automatically sanitized before emission.
Pattern ordering matters: the credit_card pattern must be checked before the phone pattern to avoid false positives where a credit card number is partially matched as phone numbers. Phone regex requires separators to avoid matching arbitrary digit sequences. The redactor is designed to be conservative: it may occasionally over-redact (false positives) but should never under-redact (false negatives), prioritizing compliance over debugging convenience.
Code Example
Section titled “Code Example”package main
import ( "context" "fmt" "log" "regexp" "strings"
"go.opentelemetry.io/otel" "go.opentelemetry.io/otel/attribute" "go.opentelemetry.io/otel/trace")
var tracer = otel.Tracer("beluga.safety.pii_redaction")
// PIIRedactor redacts PII from texttype PIIRedactor struct { patterns map[string]*regexp.Regexp enabled bool}
// PII patternsvar piiPatterns = map[string]string{ "email": `\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b`, "phone": `\b\d{3}[-.]?\d{3}[-.]?\d{4}\b`, "ssn": `\b\d{3}-?\d{2}-?\d{4}\b`, "credit_card": `\b\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}\b`, "ip_address": `\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b`,}
// NewPIIRedactor creates a new PII redactorfunc NewPIIRedactor(enabled bool) *PIIRedactor { patterns := make(map[string]*regexp.Regexp) for piiType, pattern := range piiPatterns { patterns[piiType] = regexp.MustCompile(pattern) }
return &PIIRedactor{ patterns: patterns, enabled: enabled, }}
// Redact redacts PII from textfunc (pr *PIIRedactor) Redact(ctx context.Context, text string) (string, map[string]int) { ctx, span := tracer.Start(ctx, "pii_redactor.redact") defer span.End()
if !pr.enabled { span.SetAttributes(attribute.Bool("redaction_enabled", false)) return text, nil }
redacted := text counts := make(map[string]int)
for piiType, pattern := range pr.patterns { matches := pattern.FindAllString(text, -1) count := len(matches)
if count > 0 { counts[piiType] = count
// Replace matches with redacted placeholder redacted = pattern.ReplaceAllStringFunc(redacted, func(match string) string { return pr.redactMatch(match, piiType) }) } }
span.SetAttributes( attribute.Int("total_redactions", pr.sumCounts(counts)), attribute.String("pii_types", pr.formatPIITypes(counts)), )
if len(counts) > 0 { span.SetStatus(trace.StatusOK, fmt.Sprintf("redacted %d PII items", pr.sumCounts(counts))) } else { span.SetStatus(trace.StatusOK, "no PII detected") }
return redacted, counts}
// redactMatch creates a redacted placeholder for a matchfunc (pr *PIIRedactor) redactMatch(match string, piiType string) string { // Preserve structure for debugging (e.g., "***@***.com") switch piiType { case "email": parts := strings.Split(match, "@") if len(parts) == 2 { return fmt.Sprintf("[REDACTED_EMAIL]@%s", parts[1][:min(3, len(parts[1]))]+"...") } case "phone": return "[REDACTED_PHONE]" case "ssn": return "[REDACTED_SSN]" case "credit_card": // Show last 4 digits if len(match) >= 4 { return fmt.Sprintf("[REDACTED_CC]...%s", match[len(match)-4:]) } return "[REDACTED_CC]" }
return fmt.Sprintf("[REDACTED_%s]", strings.ToUpper(piiType))}
// RedactStructured redacts PII from structured datafunc (pr *PIIRedactor) RedactStructured(ctx context.Context, data map[string]interface{}) (map[string]interface{}, error) { ctx, span := tracer.Start(ctx, "pii_redactor.redact_structured") defer span.End()
redacted := make(map[string]interface{}) totalRedactions := 0
for key, value := range data { switch v := value.(type) { case string: redactedValue, counts := pr.Redact(ctx, v) redacted[key] = redactedValue totalRedactions += pr.sumCounts(counts) case map[string]interface{}: subRedacted, err := pr.RedactStructured(ctx, v) if err != nil { return nil, err } redacted[key] = subRedacted default: redacted[key] = value } }
span.SetAttributes(attribute.Int("total_redactions", totalRedactions)) span.SetStatus(trace.StatusOK, "structured data redacted")
return redacted, nil}
func min(a, b int) int { if a < b { return a } return b}
func (pr *PIIRedactor) sumCounts(counts map[string]int) int { total := 0 for _, count := range counts { total += count } return total}
func (pr *PIIRedactor) formatPIITypes(counts map[string]int) string { types := []string{} for piiType, count := range counts { types = append(types, fmt.Sprintf("%s:%d", piiType, count)) } return strings.Join(types, ",")}
// SafeLogger wraps logging with PII redactiontype SafeLogger struct { redactor *PIIRedactor baseLogger *log.Logger}
// NewSafeLogger creates a new safe loggerfunc NewSafeLogger(redactor *PIIRedactor) *SafeLogger { return &SafeLogger{ redactor: redactor, baseLogger: log.Default(), }}
// Log logs with PII redactionfunc (sl *SafeLogger) Log(ctx context.Context, message string) { redacted, counts := sl.redactor.Redact(ctx, message)
if len(counts) > 0 { sl.baseLogger.Printf("[PII_REDACTED: %v] %s", counts, redacted) } else { sl.baseLogger.Print(redacted) }}
func main() { ctx := context.Background()
// Create redactor redactor := NewPIIRedactor(true)
// Test redaction text := "Contact john.doe@example.com or call 555-123-4567" redacted, counts := redactor.Redact(ctx, text) fmt.Printf("Original: %s\n", text) fmt.Printf("Redacted: %s\n", redacted) fmt.Printf("Counts: %v\n", counts)}Explanation
Section titled “Explanation”-
Pattern-based detection — Regex patterns target specific PII types (email, phone, SSN, etc.), providing broad coverage of common personally identifiable information. This matters because PII appears in predictable formats: emails follow RFC 5322, phone numbers follow regional formatting rules, SSNs have fixed structure in the US. Regex patterns exploit these predictable formats to detect PII without requiring machine learning models or external services. This approach is fast (regex matching is highly optimized), deterministic (same input always produces same output), and privacy-preserving (detection happens locally without sending data to external APIs). The trade-off is that regex patterns are format-specific and may miss PII in unusual formats or produce false positives on data that looks like PII but isn’t.
-
Structure preservation — Some structure is preserved in redacted output (like domain for emails, last 4 digits for credit cards). This helps with debugging while protecting PII. This matters because logs are useless if they’re completely redacted. Preserving structure provides enough context to debug issues: showing email domains helps identify misconfigured mail servers, showing last 4 digits of credit cards helps verify payment processing flow, and showing phone area codes helps debug geographic routing. The key insight is that structure and format often contain useful debugging information while the specific values are sensitive. By redacting values but preserving structure, you maintain debugging utility while complying with privacy regulations that focus on protecting individual identifiers, not aggregate patterns.
-
Audit trail — The count and type of PII redactions are tracked and logged, helping monitor what’s being processed and ensuring compliance. This matters because you need visibility into PII processing for compliance audits and security monitoring. The audit trail answers critical questions: Is PII appearing in logs unexpectedly (indicating a bug)? What types of PII are users submitting (informing feature design)? Are redaction patterns catching all PII (validating regex patterns)? The counts map provides this visibility without logging the actual PII, satisfying compliance requirements. This data can be aggregated across logs to detect anomalies—like a sudden spike in SSN detections suggesting a data breach or a new feature leaking sensitive data.
Balance between complete redaction and preserving useful information for debugging. Sometimes partial redaction (like showing last 4 digits) is acceptable and useful.
Testing
Section titled “Testing”func TestPIIRedactor_RedactsEmail(t *testing.T) { redactor := NewPIIRedactor(true)
text := "Contact user@example.com" redacted, counts := redactor.Redact(context.Background(), text)
require.Contains(t, counts, "email") require.Contains(t, redacted, "[REDACTED") require.NotContains(t, redacted, "user@example.com")}Variations
Section titled “Variations”Custom PII Patterns
Section titled “Custom PII Patterns”Add custom patterns for domain-specific PII:
func (pr *PIIRedactor) AddPattern(piiType string, pattern string) { pr.patterns[piiType] = regexp.MustCompile(pattern)}ML-based Detection
Section titled “ML-based Detection”Combine with ML models for better detection:
type MLPIIRedactor struct { regexRedactor *PIIRedactor mlModel *PIIModel}Related Recipes
Section titled “Related Recipes”- Prompt Injection Detection — Additional safety patterns
- Config Masking Secrets in Logs — Secret masking