Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Debugging

Debugging is one of the most painful and addictive activity in software engineering. One could compare it to running or sudoku. You suffer for hours trying to understand why your code is crashing, and when you find the bug, the prize is a dopamine rush and a working code.

In this lecture, we will:

Overview of the debugging process

Debugging is not easy

When a code crashes it usually writes out a cryptic error message

Typical error messages

Example:

double x[100];
x[345] = 0; // SIGSEGV

This and other kinds of memory issues are entirely avoided in Rust due to the thorough checks that are done at compile time (and run time).

let mut x = [0f64; 100];
x[345] = 0.0; // fails to compile

let mut x = vec![0f64; 100];
x[345] = 0.0; // panics at run-time
$ checkquota
          Storage/size quota filesystem report for user: rt3504
Filesystem             Mount              Used   Limit  MaxLim Comment
Stellar home           /home             4.8GB    93GB   100GB
Stellar scratch GPFS   /scratch/gpfs    29.1TB  34.2TB    35TB
Tigress GPFS           /tigress          3.8TB   9.8TB    10TB

Fileset/project space            Mount                Used by ALL By rt3504  MaxLim Comment
Projects GPFS fileset TEYSSIER   /projects/TEYSSIER           0KB         0     5TB

          Storage number of files used report for user: rt3504
Filesystem             Mount              Used   Limit  MaxLim Comment
Stellar home           /home             54.8K    952K    1.0M
Stellar scratch GPFS   /scratch/gpfs      2.3M    3.0M      3M
Tigress GPFS           /tigress         364.7K       0       0

Fileset/project space            Mount                Used by ALL By rt3504  MaxLim Comment
Projects GPFS fileset TEYSSIER   /projects/TEYSSIER             2         1    None

For quota increase requests please use this website:

         https://forms.rc.princeton.edu/quota

Take advantage of the compiler options

The -g compiler option

All compilers accept the -g option.

The -g option makes the bug go away!

Examples of useful compiler options

Warning options in gcc

Try different compilers if you can

The code crashes... now what?!

Saving the stack in core files

Examining the call stack

$ apropos debug
__after_morecore_hook (3) - malloc debugging variables
__free_hook (3)      - malloc debugging variables
__malloc_hook (3)    - malloc debugging variables
__malloc_initialize_hook (3) - malloc debugging variables
__memalign_hook (3)  - malloc debugging variables
__realloc_hook (3)   - malloc debugging variables
_nc_tracebits (3x)   - curses debugging routines
_traceattr (3x)      - curses debugging routines
_traceattr2 (3x)     - curses debugging routines
_tracecchar_t (3x)   - curses debugging routines
_tracecchar_t2 (3x)  - curses debugging routines
_tracechar (3x)      - curses debugging routines
_tracechtype (3x)    - curses debugging routines
_tracechtype2 (3x)   - curses debugging routines
_tracedump (3x)      - curses debugging routines
_tracef (3x)         - curses debugging routines
_tracemouse (3x)     - curses debugging routines
backtrace (3)        - support for application self-debugging
backtrace_symbols (3) - support for application self-debugging
backtrace_symbols_fd (3) - support for application self-debugging
BIO_debug_callback (3ssl) - BIO callback functions
CPAN::Debug (3pm)    - internal debugging for CPAN.pm
CRYPTO_mem_debug_pop (3ssl) - Memory allocation functions
CRYPTO_mem_debug_push (3ssl) - Memory allocation functions
CRYPTO_set_mem_debug (3ssl) - Memory allocation functions
CURLOPT_DEBUGDATA (3) - custom pointer for debug callback
CURLOPT_DEBUGFUNCTION (3) - debug callback
curs_trace (3x)      - curses debugging routines
DB (3pm)             - programmatic interface to the Perl debugging API
dbus-monitor (1)     - debug probe to print message bus messages
debugfs (8)          - ext2/ext3/ext4 file system debugger
debuginfod-client-config (7) - debuginfod client environment variables, cache...
debuginfod-find (1)  - request debuginfo-related data
dftest (1)           - Shows display filter byte-code, for debugging dfilter ...
dnf-debug (8)        - DNF debug Plugin
dnf-debuginfo-install (8) - DNF debuginfo-install Plugin
error::dwarf (7stap) - dwarf debuginfo quality problems
gnutls-cli-debug (1) - GnuTLS debug client
FcPatternPrint (3)   - Print a pattern for debugging
gdb (1)              - The GNU Debugger

The gdb debugger

$ gdb executable core.#
(gdb) where (or backtrace, or bt)

Using gdb

CommandAbbrev.Description
helpList gdb command topics
runrStart program execution
breakSuspend execution at specific location (line number, function, instruction address, etc.)
stepsStep to next line of code. Will step into a function if necessary
nextnExecute next line of code. Will NOT enter functions
untilContinue processing until it reaches a specified line
listlList source code with current position of execution
printpPrint value stored in a variable

The gdb debugger: a demo

Here is a simple example program in C++:

// example.cpp
#include <iostream>

int main() {

  int arr[10];
  int i = 40;
  int x = 0;

  arr[i] = 2/x;
  std::cout << arr[i] << std::endl;

  return 0;
}

Now let’s compile it with g++ and run it:

$ g++ example.cpp -o example
$ ./example
  Program terminated with signal: SIGFPE

The compiler didn’t catch the division by zero or the array index overflow. We can help the compiler a bit by pointing out that i and x are not going to change value during execution.

// example.cpp
#include <iostream>

int main() {

  int arr[10];
  const int i = 40;
  const int x = 0;

  arr[i] = 2/x;
  std::cout << arr[i] << std::endl;

  return 0;
}

Let’s try to compile it with g++ again.

$ g++ example.cpp -o example
  example.cpp: In function 'int main()':
  example.cpp:10:13: warning: division by zero [-Wdiv-by-zero]
     10 |   arr[i] = 2/x;
        |            ~^~

Now we get a warning saying that there was a division by zero. Note that this corresponds to the -Wdiv-by-zero flag, which is enabled by default. However, it still compiled even though it detected an issue. Let’s fix the issue by changing the line to const int x = 1;.

$ g++ example.cpp -o example
$ ./example
  2

Everything seems like it’s working correctly. However, let’s now try compiling it with clang++.

$ clang++ example.cpp -o example
  example.cpp:10:3: warning: array index 40 is past the end of the array (which contains 10 elements) [-Warray-bounds]
    arr[i] = 2/x;
    ^   ~
  example.cpp:6:3: note: array 'arr' declared here
    int arr[10];
    ^
  example.cpp:11:16: warning: array index 40 is past the end of the array (which contains 10 elements) [-Warray-bounds]
    std::cout << arr[i] << std::endl;
                 ^   ~
  example.cpp:6:3: note: array 'arr' declared here
    int arr[10];
    ^
  2 warnings generated.

This compiler was able to tell that we were accessing data past the end of the array. In this particular case the data we were accessing was still within the region of memory assigned to the program. Although the program didn’t crash, we were corrupting memory which opens the door to numerous issues and vulnerabilities. Try using something like const int i = 4000; to see that it causes a segmentation violation. Let’s fix this bug by changing the line to const int i = 4;.

The story in Rust is quite different. Let’s use the same starting point.

// example.rs
fn main() {

  let mut arr = [0; 10];
  let i = 40;
  let x = 0;

  arr[i] = 2/x;
  println!("{}", arr[i]);
}

Let’s try to compile it.

$ rustc example.rs
  error: this operation will panic at runtime
  --> example.rs:8:12
    |
  8 |   arr[i] = 2/x;
    |            ^^^ attempt to divide `2_i32` by zero
    |
    = note: `#[deny(unconditional_panic)]` on by default

  error: this operation will panic at runtime
  --> example.rs:8:3
    |
  8 |   arr[i] = 2/x;
    |   ^^^^^^ index out of bounds: the length is 10 but the index is 40

  error: this operation will panic at runtime
  --> example.rs:9:18
    |
  9 |   println!("{}", arr[i]);
    |                  ^^^^^^ index out of bounds: the length is 10 but the index is 40

  error: aborting due to 3 previous errors

In this case not only did it detect the issues, but it even refused to compile it.

Let’s now use gdb with our fixed C++ code to look at what’s happens step by step:

$ g++ -g example.cpp -o example
$ gdb ./example
  Reading symbols from ./example...done.
(gdb) b 6
  Breakpoint 1 at 0x40117e: file example.cpp, line 7.
(gdb) r
  Starting program: ./example...

  Breakpoint 1, main () at example.cpp:7
  7	  const int i = 4;
(gdb) p i
  $1 = 0
(gdb) s
  8	  const int x = 1;
(gdb) p i
  $2 = 4
(gdb) p x
  $3 = 0
(gdb) s
  10	  arr[i] = 2/x;
(gdb) p x
  $4 = 1
(gdb) s
  11	  std::cout << arr[i] << std::endl;
(gdb) p arr
  $5 = {-138376496, 32767, 0, 0, 2, 0, 4198544, 0, -44160, 32767}
(gdb) s
  2
  13	  return 0;

Using gdb with rust is very similar:

$ rustc -g example.rs
$ rust-gdb ./example
  Reading symbols from ./example...
(gdb) b 7
  Breakpoint 1 at 0xc072: file example.rs, line 8.
(gdb) r
  Starting program: ./example

  Breakpoint 1, example::main () at example.rs:8
  8	  arr[i] = 2/x;
(gdb) p i
  $1 = 4
(gdb) s
  9	  println!("{}", arr[i]);
(gdb) p arr
  $2 = [0, 0, 0, 0, 2, 0, 0, 0, 0, 0]

I know where the code crashed... what’s next?

Python debugger

import pdb

pdb.set_trace()
(base) ➜  ~ ./map2deb.py Work/tom/velx_00001.map
Reading Work/tom/velx_00001.map
> /Users/rt3504/map2deb.py(21)<module>()
-> with FortranFile(path_to_output, 'r') as f:
(Pdb) help

Documented commands (type help <topic>):
========================================
EOF    c          d        h         list      q        rv       undisplay
a      cl         debug    help      ll        quit     s        unt
alias  clear      disable  ignore    longlist  r        source   until
args   commands   display  interact  n         restart  step     up
b      condition  down     j         next      return   tbreak   w
break  cont       enable   jump      p         retval   u        whatis
bt     continue   exit     l         pp        run      unalias  where

Miscellaneous help topics:
==========================
exec  pdb

Let’s see another example

(base) ➜  ~ ./map2deb.py Work/tom/velx_00001.map
Reading Work/tom/velx_00001.map
> /Users/rt3504/map2deb.py(21)<module>()
-> with FortranFile(path_to_output, 'r') as f:
(Pdb) help p
p expression
        Print the value of the expression.
(Pdb) p path_to_output
'Work/tom/velx_00001.map'
(Pdb)

Please use checkpoint-restart!

Using print for monitoring and debugging

Debugging memory leaks

A particularly difficult type of bugs is those related to memory management, in particular what is called memory leaks. Usually, any good compiler will make sure that the temporary memory for arrays in subroutines and functions are properly deallocated when leaving the subroutine or the function. Sometimes, it is impossible and mistakes are made that slowly and systematicaly drain all the available memory. Ultimately, the code will crash because it runs out of memory.

The program below is an example of such a memory leak. In the subroutine, a temporary array is allocated and deallocated on exit, but a pointer pointing to this array is not. This will lead to a memory leak.

// ml.cpp
#include <iostream>

void computeNothing() {
    int *array = new int[100000000];
    int *p = new int[100000000];
    for (int i = 0; i < 100000000; ++i) {
        array[i] = i * 2;
    }
    p = array;
    delete p;
}

int main() {
    for (int i = 1; i <= 200; ++i) {
        if (i % 10 == 0)
            std::cout << i << std::endl;
        computeNothing();
    }
    return 0;
}

We can compile this code using the -g option but nothing will be detected both at compilation time and at run time.

$ g++ -g ml.cpp -o ml
$ ./ml

Using the top command at several times, one can see the virtual memory (VIRT below) slowly increasing from slightly less than 0.,5GB to more than 14GB and counting. With more than 200 loops, the code would have crashed.

$ top -n 1 | grep -B1 ml
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
3092916 rt3504    20   0 4316088  81436   2232 R  94.4   0.0   0:02.29 ml
$ top -n 1 | grep -B1 ml
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
3092916 rt3504    20   0 6269228 314928   2232 R 100.0   0.1   0:03.69 ml
$ top -n 1 | grep -B1 ml
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
3092916 rt3504    20   0 9937.0m  87640   2232 R 100.0   0.0   0:06.04 ml
$ top -n 1 | grep -B1 ml
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
3092916 rt3504    20   0   11.9g 343664   2232 R 100.0   0.1   0:07.70 ml
$ top -n 1 | grep -B1 ml
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
3092916 rt3504    20   0   14.5g 298636   2232 R  94.4   0.1   0:09.42 ml

We now use the valgrind utility to debug specifically memory leaks. The code is executed and valgrind targets allocations and deallocations, and quickly identifies that there is a problem.

$ valgrind --leak-check=full ./ml
  ==29462== Warning: set address range perms: large range [0x1ef18f040, 0x206f07440) (undefined)
  ==29462== Warning: set address range perms: large range [0x1d7416040, 0x1ef18e440) (undefined)
  ==29462== Warning: set address range perms: large range [0x1ef18f028, 0x206f07458) (noaccess)
  ==29462==
  ==29462== HEAP SUMMARY:
  ==29462==     in use at exit: 8,000,000,000 bytes in 20 blocks
  ==29462==   total heap usage: 42 allocs, 22 frees, 16,000,073,728 bytes allocated
  ==29462==
  ==29462== 4,000,000,000 bytes in 10 blocks are possibly lost in loss record 1 of 2
  ==29462==    at 0x403C0F3: operator new[](unsigned long) (in /cvmfs/cms.cern.ch/el8_amd64_gcc11/external/valgrind/3.17.0-7bfcd2b5e4f162fb4b127c18285f46f6/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
  ==29462==    by 0x4011B5: computeNothing() (ml.cpp:6)
  ==29462==    by 0x40126B: main (ml.cpp:18)
  ==29462==
  ==29462== 4,000,000,000 bytes in 10 blocks are definitely lost in loss record 2 of 2
  ==29462==    at 0x403C0F3: operator new[](unsigned long) (in /cvmfs/cms.cern.ch/el8_amd64_gcc11/external/valgrind/3.17.0-7bfcd2b5e4f162fb4b127c18285f46f6/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
  ==29462==    by 0x4011B5: computeNothing() (ml.cpp:6)
  ==29462==    by 0x40126B: main (ml.cpp:18)
  ==29462==
  ==29462== LEAK SUMMARY:
  ==29462==    definitely lost: 4,000,000,000 bytes in 10 blocks
  ==29462==    indirectly lost: 0 bytes in 0 blocks
  ==29462==      possibly lost: 4,000,000,000 bytes in 10 blocks
  ==29462==    still reachable: 0 bytes in 0 blocks
  ==29462==         suppressed: 0 bytes in 0 blocks
  ==29462==
  ==29462== For lists of detected and suppressed errors, rerun with: -s
  ==29462== ERROR SUMMARY: 22 errors from 3 contexts (suppressed: 0 from 0)

the correct code would be in this case:

// ml.cpp
#include <iostream>

void computeNothing() {
    int *array = new int[100000000];
    int *p = new int[100000000];
    for (int i = 0; i < 100000000; ++i) {
        array[i] = i * 2;
    }
    delete p;
    p = array;
    delete p;
}

int main() {
    for (int i = 1; i <= 200; ++i) {
        if (i % 10 == 0)
            std::cout << i << std::endl;
        computeNothing();
    }
    return 0;
}

Again, the story in Rust is very different. The Rust compiler guarantees that it will deallocate any memory at the point where it is no longer accessible. It does so by strictly following the Resource Acquisition Is Initialization (RAII) technique. Most of these memory issues can also be prevented by following modern C++ standards, but in older languages like C or Fortran it is a lot easier to introduce memory bugs.

Using Graphical Debuggers

Nowadays many solutions are available for code developments, editing and debugging. These are called Integrated Development Environments (IDE). Famous examples are:

Here is a screenshot of the PyCharm IDE:

In these IDE, git version control, debugging, compiling and editing are all integrated together with a powerful user interface. Once you try it, you adopt it. The downside is that it is tricky to work on code on remote computers. You need to be familiar with ssh tunneling which can be tricky and unstable.

The ddt debugger is a good option for C/C++. See web page here.

Just type:

$ ddt a.out

with the executable a.out better being compiled with the -g option.