C++ urlencode function


A url like http://hostname.com:80/folder/file.php?arg=value&b=c#anchor has several components:
scheme: http
host: hostname.com
port: 80
path: folder/file.php
query parameters: arg=value&b=c
fragment: anchor

The query parameters are a set of keys and values. The problem is that often query parameter values conflict with other portions of the query string. For instance if you were doing an address lookup and the url is lookup.php?address=1500 #200 M&M Street, the # of the address conflicts with the anchor fragment, and the & conflicts with a delimiter in the query string. Most languages have a built in function to encode or escape the data so it doesn't conflict. Spaces can be encoded as + or %20, & is encoded as %26, and # as %23. These numbers of course are the characters' hex values from an ascii table. It would be excellent to have a C++ function to automatically encode a url query parameter's value or its uricomponent.

PHP provides us with a urlencode function, javascript provides us with 3 escape, encodeURI and encodeURIComponent(). So here is a urlencode() function for c++, it was modeled after javascript's encodeURIComponent(), and uses the php function name, parameter type and return type.
#include <iostream>
#include <sstream>
#include <string>
 
using std::cout;
using std::endl;
 
std::string urlencode(const std::string &c);
 
int main(int argc, char *argv[])
{
    std::string address = "123 #5 M&M Street";
    cout << "address=" << address << endl;
    cout << "address=" << urlencode(address) <<endl;
    //outputs 123%20%235%20M%26M%20Street
}
 
std::string urlencode(const std::string &s)
{
    static const char lookup[]= "0123456789abcdef";
    std::stringstream e;
    for(int i=0, ix=s.length(); i<ix; i++)
    {
        const char& c = s[i];
        if ( (48 <= c && c <= 57) ||//0-9
             (65 <= c && c <= 90) ||//abc...xyz
             (97 <= c && c <= 122) || //ABC...XYZ
             (c=='-' || c=='_' || c=='.' || c=='~') 
        )
        {
            e << c;
        }
        else
        {
            e << '%';
            e << lookup[ (c&0xF0)>>4 ];
            e << lookup[ (c&0x0F) ];
        }
    }
    return e.str();
}

Note: It is possible to use sprintf(buffer, "%x", c[i]) to convert a character to hex, but I wanted to avoid using sprintf in this case.

Brett on 2011-04-20 22:15:39
According to RFC 3986 section 2.3 Unreserved Characters (January 2005), the characters you are escaping are wrong.

Here is a corrected version:

http://codepad.org/lCypTglt

Hyo min on 2015-05-07 02:09:31
@Brett Oh, Thank you!

Hyo min on 2015-05-07 02:48:04
@Brett The code to be below.
At line 17:
sprintf(buf, "%.2X", (unsigned char) s[i]); // for Multibyte string.