Tuesday, March 21, 2017

Path with Backslash in C++11 Regex

String with backslash in C/C++ is sometimes problematic due to backslash being an escape character in C/C++ character string. It even got more complicated as we use the string in regular expression (regex) because backslash is also an escape character in regex. Therefore, the number of backslashes that you need grows exponentially if you intend to feed literal backslash character into the regex engine. Let's see a working sample code.
void regex_test()
{
    cout << "Executing " << __func__ << endl;

    std::string s ("This machine has c:\\ ,D:\\, E:\\ and z:\\ drives");
    std::smatch m;

    /**
     * We have to use \\\\ so that we get \\ which means an escaped backslash.
     *
     * It's because there are two representations. In the string representation
     * of the regex, we have "\\\\", Which is what gets sent to the parser.
     * The parser will see \\ which it interprets as a valid escaped-backslash
     * (which matches a single backslash).
     */
    std::regex e ("[a-zA-Z]:\\\\");   // matches drive path

    std::cout << "Target sequence: " << s << std::endl;
    std::cout << "Regular expression: /[a-zA-Z]:\\\\\\\\/" << std::endl;
    std::cout << "The following matches and submatches were found:" << std::endl;

    while (std::regex_search (s,m,e)) {
        for (auto x:m) std::cout << x << " ";
        std::cout << std::endl;
        s = m.suffix().str();
    }
}

The code above shows that you need four backslashes to feed one literal backslash into the regex engine. Why is that? because you need four backslashes to produce two escaped backslashes in the regex string. The other two backslashes act as escape characters in the C/C++ compiler that you use.
Therefore, the "produced" two backslashes then act as a single escaped backslash for the regex engine which parses the input string.
Anyway, to give you an idea, this is the output of the function above in Linux:
Executing regex_test
Target sequence: This machine has c:\ ,D:\, E:\ and z:\ drives
Regular expression: /[a-zA-Z]:\\\\/
The following matches and submatches were found:
c:\ 
D:\ 
E:\ 
z:\ 
I hope this helps poor souls out there working with regex in C/C++.
Post a Comment

No comments: