
View on GitHub


Test Coverage
# FastCSV

[![Gem Version](](
[![Build Status](](
[![Coverage Status](](
[![Code Climate](](

A fast [Ragel]( CSV parser, compatible with Ruby's CSV.

## Usage

`FastCSV.raw_parse` is implemented in C and is the fastest way to read CSVs with FastCSV.

require 'fastcsv'

# Read from file. do |f|
  FastCSV.raw_parse(f) do |row|
    # do stuff

# Read from an IO object.
FastCSV.raw_parse("foo,bar\n")) do |row|
  # do stuff

# Read from a string.
FastCSV.raw_parse("foo,bar\n") do |row|
  # do stuff

# Transcode like with the CSV module.
FastCSV.raw_parse("\xF1\n", encoding: 'iso-8859-1:utf-8') do |row|
  # ["ñ"]

FastCSV can be used as a drop-in replacement for [CSV]( (replace `CSV` with `FastCSV`) except:

* The `:row_sep` option is ignored. The default `:auto` is implemented [#9](
* The `:col_sep` option must be a single-byte string, like the default `,` [#8]( [Python]( and [PHP]( support single-byte delimiters only, as do the major libraries in [JavaScript](, [Java](, [C](, [Objective-C]( and [Perl]( A major [Node]( library supports multi-byte delimiters. The [CSV Dialect Description Format]( allows only single-byte delimiters.
* If FastCSV raises an error, you can't continue reading [#3]( Its error messages don't perfectly match those of CSV.

A few minor caveats:

* Use `FastCSV.parse_line(string, options)` instead of `string.parse_csv(options)`.
* If you were passing CSV an IO object on which you had wrapped `#gets` (for example, as described in [this article](, `#gets` will not be called.
* The `:field_size_limit` option is ignored. If you need to prevent DoS attacks – the [ostensible reason]( for this option – limit the size of the input, not the size of quoted fields.
* FastCSV doesn't support UTF-16 or UTF-32. See [UTF-8 Everywhere](

## Development

    ragel -G2 ext/fastcsv/fastcsv.rl
    ragel -Vp ext/fastcsv/fastcsv.rl | dot -Tpng -o machine.png
    rake compile
    gem uninstall fastcsv
    rake install
    rspec test/runner.rb test/csv

### Implementation

FastCSV implements its Ragel-based CSV parser in C at `FastCSV::Parser`.

FastCSV is a subclass of [CSV]( It overrides `#shift`, replacing the parsing code, in order to act as a drop-in replacement.

FastCSV's `raw_parse` requires a block to which it yields one row at a time. FastCSV uses [Fiber]( to pass control back to `#shift` while parsing.

CSV delegates IO methods to the IO object it's reading. IO methods that move the pointer within the file like `rewind` changes the behavior of CSV's `#shift`. However, FastCSV's C code won't take notice. We therefore null the Fiber whenever the pointer is moved, so that `#shift` uses a new Fiber.

CSV's `#shift` runs the regular expression in the `:skip_lines` option against a row's raw text. `FastCSV::Parser` implements a `row` method, which returns the most recently parsed row's raw text.

FastCSV is tested against the same tests as CSV. See []( for details.

## Why?

I evaluated [many CSV Ruby gems](, and they were either too slow or had implementation errors. [rcsv]( is fast and [libcsv](, but it skips blank rows (Ruby's CSV module returns an empty array) and silently fails on input with an unclosed quote. [bamfcsv]( is well implemented, but it's considerably slower on large files. I looked for Ragel-based CSV parsers to copy, but they either had implementation errors or could not handle large files. [commas]( looks good, but it performs a memory check on each character, which is overkill.

## Acknowledgements

Started as a Ruby 2.1 fork of MoonWolf <>'s CSVScan, found in [this commit]( CSVScan uses Ragel code from [HPricot]( from [this commit]( Most of the Ruby (i.e. non-C, non-Ragel) methods are copied from [CSV](

Copyright (c) 2014 James McKinney, released under the MIT license