🍋
Menu
Troubleshooting Beginner 1 min read 291 words

CSV Parsing: Handling Quotes, Commas, and Encoding Edge Cases

Troubleshoot common CSV parsing failures. Covers quoted fields with embedded commas, multiline values, BOM issues, and encoding mismatches that cause data corruption in spreadsheets and import tools.

Key Takeaways

  • CSV (Comma-Separated Values) appears trivially simple but hides surprising complexity.
  • Excel on Windows requires a UTF-8 BOM (byte order mark, EF BB BF) to correctly detect UTF-8 encoding.
  • Never split on commas directly — use a proper CSV parser that handles quoting, escaping, and multiline values.
  • Most programming languages and Unix tools do not add a BOM by default — you must add it explicitly for Excel compatibility.
  • ## Robust Parsing Strategy Never split on commas directly — use a proper CSV parser that handles quoting, escaping, and multiline values.

CSV Is Not Simple

CSV (Comma-Separated Values) appears trivially simple but hides surprising complexity. RFC 4180 defines the standard, yet many CSV files do not conform — they use different delimiters, quoting rules, or line endings. Parsing tools that assume a well-formed CSV will silently produce wrong results on real-world data.

Common Parsing Failures

Problem Cause Solution
Fields shifted right Unquoted field contains comma Quote fields with commas
Truncated fields Unquoted field contains newline Quote multiline fields
Extra quotes visible Double-quote escaping not applied Use "" inside quoted fields
Garbled characters UTF-8 file opened as Latin-1 Specify encoding explicitly
Leading zeros dropped Excel interprets as number Prepend = or format as text

The BOM Problem

Excel on Windows requires a UTF-8 BOM (byte order mark, EF BB BF) to correctly detect UTF-8 encoding. Without it, Excel defaults to the system locale encoding, corrupting international characters. Most programming languages and Unix tools do not add a BOM by default — you must add it explicitly for Excel compatibility.

Delimiter Detection

Not all CSV files use commas. European files often use semicolons (because commas are decimal separators), TSV uses tabs, and some files use pipes. When receiving unknown CSV files, detect the delimiter by counting separator frequency in the first few lines.

Robust Parsing Strategy

Never split on commas directly — use a proper CSV parser that handles quoting, escaping, and multiline values. In JavaScript, PapaParse handles edge cases correctly. Parse and validate CSV files with the Peasy CSV tools for instant error detection and format correction.