Monday, October 2, 2017

Subclassing HTMLParser Class in Python 2

Using HTMLParser class (https://docs.python.org/2/library/htmlparser.html) in Python 2 is rather easy if you don't need to pass parameter to your subclass for custom processing of the HTML tags. But, what if you do? This is rather trivial to do in Python 3, as seen here. The problem with Python 2, if you follow the "normal" way of invoking the parent HTMLParser class as explained at https://stackoverflow.com/questions/2399307/how-to-invoke-the-super-constructor , you would encounter error like this: TypeError: super() argument 1 must be type, not classobj.

Now, how to fix that error? The error culprit is explained at: https://stackoverflow.com/questions/1713038/super-fails-with-error-typeerror-argument-1-must-be-type-not-classobj#1713052. However, it doesn't give us satisfactory fix for the error because you would need to mess with HTMLParser class for that to work. I prefer not to do it. This is where Python's type keyword comes to the rescue. The code below shows how to properly subclass HTMLParser in Python 2, it might not be pretty a.k.a it's a rather quick-hack, but it works.
from HTMLParser import HTMLParser
from htmlentitydefs import name2codepoint

class ImgHtmlParser(HTMLParser):
    def __init__(self, path):
        super(type (self), self).__init__()
        self.reset()
        self.fed = []
        self.download_path = path
        print "ImgHtmlParser constructor"

    def handle_starttag(self, tag, attrs):
        if tag == 'img':
            print "Start tag:", tag
            for attr in attrs:
                print "     attr:", attr
                if attr[0] == "data-fullres-src":
                    print "image URL: " + attr[1]
                    print "Download Path = " + self.download_path 

I used the type keyword in place of the derived class literal name. It's not foolproof though if ImgHtmlParser class has a child class, but in this case, it doesn't have one. So, we're OK.

Monday, May 15, 2017

Checking whether you have MS17-010 Windows Update Installed (a.k.a Guarding Against WannaCry)

Kaspersky Lab GReaT team explains about protecting yourself from WannaCrypt/WannaCry infection over at: https://blog.kaspersky.com/wannacry-ransomware/16518/. The article specifically mentioned:
"Install software updates. This case earnestly calls for installing the system security update MS17-010 for all Windows users, especially when Microsoft even released it for systems that are not officially supported anymore, such as Windows XP or Windows 2003. Seriously, install it right now. Now is exactly the time when it's really important."
The explanation above specifically mentioned about MS17-010 Windows system security update, explained at: https://support.microsoft.com/en-us/help/4013389/title (the vulnerability is explained at https://technet.microsoft.com/library/security/MS17-010). What is not very clear is how do you check whether the update is already installed on your Windows machine or not. The steps are easy for Windows power users but not trivial at all for those not familiar with Windows Update mechanism.

I'll show you how to do this on Windows 10 version 1607. You can carry out similar steps for other Windows version.

  1. First step, is locate the KB (Knowledge Base) number of the specific security update. So, we look for MS17-010 security update explanation. It's at: https://support.microsoft.com/en-us/help/4013389/title. Scroll down to your specific Windows version, for Windows 10 build 1607, we found the KB number from the update file name: Windows10.0-KB4013429-x64.msu. The filename indicates the KB number to be: KB4013429.
  2. Search for the specific Windows KB "support article". In this case, just search for "KB4013429" (without the quotes) in a search engine. We found this at: https://support.microsoft.com/en-us/help/4013429/windows-10-update-kb4013429. What is important to look at is the hotfix number of updates superseding our target update, because if either of the hotfix is installed, we're basically good, i.e. we have MS17-010 fixes installed. Like so: 
    Superseding Windows Hotfix numbers (circled in RED)
  3. Now, we know what  Windows update hotfix versions we need to check for. For our Windows 10 version 1607, the superseding hot fixes are: KB4019472 (OS Build 14393.1198), KB4015217 (OS Build 14393.1066 and 14393.1083), KB4016635 (OS Build 14393.970), KB4015438 (OS Build 14393.969). Therefore, if any one of them are installed. We're good.
  4. Check for installed updates in the Windows machine. We can use systeminfo command line utility for that. For example: 
    C:\systeminfo > c:\Users\blah\Desktop\updates.txt
    
    Open updates.txt to see the installed Hotfix versions. This is a sample output in updates.txt
    Installed hotfixes in Windows 10 Machine

At this point, we can be sure whether MS17-010 or its equivalent is installed. Hopefully, this helps those wanting to know whether their system has MS17-010 update installed or not.

Cheers.

Tuesday, May 2, 2017

"Signal" Handling in Windows Console Application

Signal handling in Windows console application is quite different from what POSIX defines. Well, you could do it the POSIX way if you're using Visual Studio (see: signal). But, the behavior is not quite like POSIX in all circumstances. The Windows native "signal" handling is the way to go if you're using third party compiler suite or cross-compiling via MinGW-W64. The native "signal" handling is also known as Windows Console Control Handlers. The Console Control Handlers are "reachable" via native Windows API.

There is a peculiarity in Windows Console Control Handler compared to the way POSIX handle CTRL+C (SIGINT) signal. In Windows, a new thread is created by Windows which invoke the registered control handler to process the signal, see: CTRL+C and CTRL+BREAK Signals. Contrary, in POSIX, the OS doesn't run the signal handler in a new thread.

Now, let's look at how you would implement a native Windows signal handler for console application. The Windows API that you need is: SetConsoleCtrlHandler(). As for, how to use the function, MSDN has it covered: Registering a Control Handler Function. FYI, I have tested part of the routine with Mingw-w64 cross compiler suite and run the executable in Windows 10. I confirmed that it works as "advertised".

Tuesday, March 21, 2017

Path with Backslash in C++11 Regex

String with backslash in C/C++ is sometimes problematic due to backslash being an escape character in C/C++ character string. It even got more complicated as we use the string in regular expression (regex) because backslash is also an escape character in regex. Therefore, the number of backslashes that you need grows exponentially if you intend to feed literal backslash character into the regex engine. Let's see a working sample code.
void regex_test()
{
    cout << "Executing " << __func__ << endl;

    std::string s ("This machine has c:\\ ,D:\\, E:\\ and z:\\ drives");
    std::smatch m;

    /**
     * We have to use \\\\ so that we get \\ which means an escaped backslash.
     *
     * It's because there are two representations. In the string representation
     * of the regex, we have "\\\\", Which is what gets sent to the parser.
     * The parser will see \\ which it interprets as a valid escaped-backslash
     * (which matches a single backslash).
     */
    std::regex e ("[a-zA-Z]:\\\\");   // matches drive path

    std::cout << "Target sequence: " << s << std::endl;
    std::cout << "Regular expression: /[a-zA-Z]:\\\\\\\\/" << std::endl;
    std::cout << "The following matches and submatches were found:" << std::endl;

    while (std::regex_search (s,m,e)) {
        for (auto x:m) std::cout << x << " ";
        std::cout << std::endl;
        s = m.suffix().str();
    }
}

The code above shows that you need four backslashes to feed one literal backslash into the regex engine. Why is that? because you need four backslashes to produce two escaped backslashes in the regex string. The other two backslashes act as escape characters in the C/C++ compiler that you use.
Therefore, the "produced" two backslashes then act as a single escaped backslash for the regex engine which parses the input string.
Anyway, to give you an idea, this is the output of the function above in Linux:
Executing regex_test
Target sequence: This machine has c:\ ,D:\, E:\ and z:\ drives
Regular expression: /[a-zA-Z]:\\\\/
The following matches and submatches were found:
c:\ 
D:\ 
E:\ 
z:\ 
I hope this helps poor souls out there working with regex in C/C++.