C++ CSV Parser


The function featured here is of course csvline_populate. It parses a line of data by a delimiter. If you pass in a comma as your delimiter it will parse out a Comma Separated Value (CSV) file. If you pass in a '\t' char it will parse out a tab delimited file (.txt or .tsv). CSV files often have commas in the actual data, but accounts for this by surrounding the data in quotes. This also means the quotes need to be parsed out, this function accounts for that as well.


It would make some sense to only pass in the line and delimiter to the function, and have the return type be the vector. However in terms of performance under heavy loads, this makes less sense. Passing in a predefined vector allows a function to populate it, copying bytes from your line as it goes. However, to not pass in a predefined vector, we'd have to declare it as a local variable to the function, which declares it on the stack. This means when the function completes, the variable will be deallocated, so when it returns the vector, the return keyword uses the copy constructor of the vector (which uses the copy constructor of each string object) to assign the return type to the variable in the caller function. To copy the return value of an object (depending on the size of your vector and the number of times you do it), can be an expensive operation.

input.csv
Year,Make,Model,Description,Price
1997,Ford,E350,"ac, abs, moon",3000.00
1999,Chevy,"Venture ""Extended Edition""","",4900.00
1999,Chevy,"Venture ""Extended Edition, Very Large""",,5000.00
1996,Jeep,Grand Cherokee,"MUST SELL!
air, moon roof, loaded",4799.00

main.cpp
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <sstream>
#include <istream>
 
using std::cout;
using std::endl;
 
std::vector<std::string> csv_read_row(std::istream &in, char delimiter);
std::vector<std::string> csv_read_row(std::string &in, char delimiter);
 
int main(int argc, char *argv[])
{
    std::ifstream in("input.csv");
    if (in.fail()) return (cout << "File not found" << endl) && 0;
    while(in.good())
    {
        std::vector<std::string> row = csv_read_row(in, ',');
        for(int i=0, leng=row.size(); i<leng; i++)
            cout << "[" << row[i] << "]" << "\t";
        cout << endl;
    }
    in.close();
 
    std::string line;
    in.open("input.csv");
    while(getline(in, line)  && in.good())
    {
        std::vector<std::string> row = csv_read_row(line, ',');
        for(int i=0, leng=row.size(); i<leng; i++)
            cout << "[" << row[i] << "]" << "\t";
        cout << endl;
    }
    in.close();
 
    return 0;
}
 
std::vector<std::string> csv_read_row(std::string &line, char delimiter)
{
    std::stringstream ss(line);
    return csv_read_row(ss, delimiter);
}
 
std::vector<std::string> csv_read_row(std::istream &in, char delimiter)
{
    std::stringstream ss;
    bool inquotes = false;
    std::vector<std::string> row;//relying on RVO
    while(in.good())
    {
        char c = in.get();
        if (!inquotes && c=='"') //beginquotechar
        {
            inquotes=true;
        }
        else if (inquotes && c=='"') //quotechar
        {
            if ( in.peek() == '"')//2 consecutive quotes resolve to 1
            {
                ss << in.get();
            }
            else //endquotechar
            {
                inquotes=false;
            }
        }
        else if (!inquotes && c==delimiter) //end of field
        {
            row.push_back( ss.str() );
            ss.str("");
        }
        else if (!inquotes && (c=='\r' || c=='\n') )
        {
            row.push_back( ss.str() );
            return row;
        }
        else
        {
            ss << c;
        }
    }
}


milkplus on 2009-12-15 00:21:16
very clean and excellent parser! one issue i saw was leading and trailing whitespace on the row strings is not trimmed off.

Roberto on 2010-03-03 12:01:42
Nice~

Roberto on 2010-07-07 23:50:32
I believe a far more efficient C++ CVS implementation can be found here: http://www.codeproject.com/KB/recipes/Tokenizer.aspx

Lis on 2010-12-18 15:53:00
Well the above lib makes heavy use of boost.
While this is just a simple 50 line function.

Great work.

Jeff B on 2011-04-19 05:29:35
Great little function. So simple.

Bernard on 2011-05-30 14:40:16
Nice. Thanks.

Runia on 2011-06-05 17:33:23
Thank you for sharing the csv parser. It works perfectly.
Maybe http://www.codeproject.com/KB/recipes/Tokenizer.aspx is more efficient and the ideas behind are nice but it is not as easily applied neither does it yield immediate results like your code. Thank you.

Rurutia on 2014-03-24 23:22:07
Nice , thank you !!