C++ explode function


Javascript provides a nice explode function split(), php does the same with its explode() function. You pass in a string which it parses, and breaks it up into an array of substrings which are delimited by a delimiter. For example if you pass in "one,two,three" as your string, and comma as your delimiter it returns an array of the substrings {"one", "two", "three"}. It would also make the function intuitive for php programmers to have the same parameters and return type as the php function.

The C++ Standard Template Library provides a type of dynamically allocated array, a vector. We should be able to build a function that returns a vector of strings.
#include <iostream>
#include <vector>
#include <string>
 
using namespace std;
 
vector<string> explode( const string &delimiter, const string &explodeme);
 
int main(int argc, char *argv[])
{
    string str = "I have a lovely bunch of cocoa nuts";
    cout<<str<<endl;
    vector<string> v = explode(" ", str);
    for(int i=0; i<v.size(); i++)
        cout <<i << " ["<< v[i] <<"] " <<endl;
}
 
vector<string> explode( const string &delimiter, const string &str)
{
    vector<string> arr;
 
    int strleng = str.length();
    int delleng = delimiter.length();
    if (delleng==0)
        return arr;//no change
 
    int i=0;
    int k=0;
    while( i<strleng )
    {
        int j=0;
        while (i+j<strleng && j<delleng && str[i+j]==delimiter[j])
            j++;
        if (j==delleng)//found delimiter
        {
            arr.push_back(  str.substr(k, i-k) );
            i+=delleng;
            k=i;
        }
        else
        {
            i++;
        }
    }
    arr.push_back(  str.substr(k, i-k) );
    return arr;
}

Note, because we wanted to stick with mirroring the same parameters that php uses, we had to return the vector instead of a reference to an existing vector. This is not be very efficient because the entire vector is copied (copy constructor) when it is returned and assigned to v. This is because the vector is allocated on the stack within the explode function. When the function completes, and the function is popped off the stack, the vectors is de-allocated, and so when we return the vector, we implicitly call the copy constructor of the vector which in turn calls the copy constructor of string in the vector, which is avoidable.

How is it avoidable or how can we make it more efficient? By passing in a reference to an existing vector of strings. We did not do that in this example because that would violate the requirements of this function (mirror the php function). There is another way around it, and that is to use the new keyword to allocate a vector on the heap. The only problem with this, is that you have to explicitly delete what is eventually returned in order to avoid memory leaks. Also, it violates the recommended practice, of putting 'new' and its corresponding 'delete' at the beginning and end of the same function.

will on 2009-12-22 05:19:07
this works great! very similar to php's explode. I was having issues with boost's char_separator trying to read a unique set of chars to explode on

Aidas on 2013-12-08 08:23:13
DÄ—kuj, tai veikia puikiai :)