Thursday, September 1, 2016

How to trace python daemon program

Well some of you might think whats there to trace in a python program simply use pdb.
Oh if you dont know pdb debugging you ought to read whole full post, if you know about pdb I will describe other method which worked for demonized + threads code from fourth paragraph.

    So I was helping a friend to debug a code to find data flow. The code was running as demon making it difficult to debug. The module consisted multiple files and main file being around 2000 lines of code. So I first went to my favorite option pdb.

 import pdb
 pdb.set_trace()

Normally if you put these two line in any normal python code then you will get a interactive shell and execution will stop at that line and wait for your commands. Then you can just enter command like 'n' to execute n(ext) line, or s(tep) to step into functions. You can use command  w(here) to print a stack trace to check where is the current execution line and nearby code. If you are interested in details of the pdb basics checkout the article at https://pymotw.com/2/pdb/ or https://docs.python.org/2/library/pdb.html Since code was running as daemon and running threads it was difficult to interact with pdb. The pdb shell was just getting printed somewhere on the command line and was not possible to interact with it. So I went ahead to check other options how trace the function calls of these program.

    I found a other option of  sys.settrace() to see which function of the program are being called in which order. The sys.settrace() needs a function to be written to handle the callback. So here is a function code which I got from internet and updated to my needs:

def trace_calls(frame, event, arg):
    if event != 'call':
        return

    co = frame.f_code
    if co is not None:
        func_name = co.co_name
        if func_name in ['write',' __getattr__']:
            # Ignore write() calls from print statements
            return
        line_no = frame.f_lineno
        filename = co.co_filename
        #only output function names from module dir
        if 'mymodule' not in filename:
            return

        str1 = 'Call to %s on line %s of %s \n Args:%s\n' % (func_name, line_no, filename,str(arg))
        print (str1)

        if func_name in TRACE_INTO:
            # Trace into this function
            return trace_lines
            
        return

I will explain somethings from this code that I changed/added to make it more simple for me. First of all There were multiple calls to multiple imported libraries. So easily the file size went to 28-30 MB. Then I put this if condition:

if 'mymodule' not in filename:


So now only function calls from 'mymodule' will be printed. Now the printed statements became less. The code prints line number from which function name was called and the file path to which the function belongs. The sys.settrace() function needs to be called and this function takes parameter function name (in our case trace_calls) which will be called when a function call happens.

sys.settrace(trace_calls)

The trace_calls function will print the information about function call as describe before.
So then I went through the multiple function calls and got the whole data flow and where to make changes to alter the data.

You might have notice one more function call trace_lines in the trace_calls function.

TRACE_INTO=['dataReceived']
def trace_lines(frame, event, arg):
    if event != 'line':
        return

    co = frame.f_code
    if co is not None:
        func_name = co.co_name
        line_no = frame.f_lineno
        filename = co.co_filename
        str1 = '  %s line %s' % (func_name, line_no)
        print (str1)

This trace line function can be called if you want to trace the execution line by line inside any function. Here a list TRACE_INTO is created and only functions in side this list will be traced as line by line. So if you feel any function is important and need to check execution you can use this functionality.