Given an a.out executable that only does raise (SIGABRT), invoking that one...

  • ... against crash-dump-core will...

    • ... not overwrite existing core files.

      Is this reasonable? Linux does overwrite them, for example.

    • ... show big variances in running-time behavior:

      $ TIMEFORMAT='real %R user %U system %S'
      $ rm -f core; time env CRASHSERVER=/servers/crash-dump-core ./a.out; ls -l core
      Aborted (core dumped)
      real 1.350 user 0.000 system 0.010
      -rw------- 1 tschwinge tschwinge 17031168 Jul  7 21:59 core
      $ rm -f core; time env CRASHSERVER=/servers/crash-dump-core ./a.out; ls -l core
      Aborted (core dumped)
      real 22.771 user 0.000 system 0.010
      -rw------- 1 tschwinge tschwinge 17031168 Jul  7 21:59 core
      $ rm -f core; time env CRASHSERVER=/servers/crash-dump-core ./a.out; ls -l core
      Aborted (core dumped)
      real 1.367 user 0.000 system 0.010
      -rw------- 1 tschwinge tschwinge 17031168 Jul  7 22:00 core
      $ rm -f core; time env CRASHSERVER=/servers/crash-dump-core ./a.out; ls -l core
      Aborted (core dumped)
      real 5.789 user 0.000 system 0.010
      -rw------- 1 tschwinge tschwinge 17031168 Jul  7 22:00 core
      $ rm -f core; time env CRASHSERVER=/servers/crash-dump-core ./a.out; ls -l core
      Aborted (core dumped)
      real 22.664 user 0.010 system 0.000
      -rw------- 1 tschwinge tschwinge 17031168 Jul  7 22:01 core
      
    • ... produce a huge core file:

      $ du -hs core 
      17M     core
      

      On Linux, the core file occupies 76 KiB of disk space, which seems much more reasonable. This is possibly related with the default 128MiB heap preallocation.

    • ... does not always produce a useful backtrace:

      abort();

      $ gdb test core
      warning: core file may not match specified executable file.
      [New Thread 86678]
      warning: Wrong size fpregset in core file.
      ...
      Core was generated by `./test'.
      Program terminated with signal 6, Aborted.
      warning: Wrong size fpregset in core file.
      (gdb) bt
      #0  0x00000000 in ?? ()
      #1  0x011f593f in __msg_sig_post (process=72, signal=6, sigcode=0, refport=1)
          at /build/buildd-eglibc_2.10.2-7-hurd-i386-iGL6op/eglibc-2.10.2/build-tree/hurd-i386-libc/hurd/RPC_msg_sig_post.c:144
      #2  0x0109a433 in kill_port (pid=<value optimized out>)
          at ../sysdeps/mach/hurd/kill.c:68
      #3  kill_pid (pid=<value optimized out>) at ../sysdeps/mach/hurd/kill.c:105
      #4  0x0109a69f in __kill (pid=21142, sig=6) at ../sysdeps/mach/hurd/kill.c:139
      #5  0x01099af6 in raise (sig=6) at ../sysdeps/posix/raise.c:27
      #6  0x0109de59 in abort () at abort.c:88
      #7  0x0804849f in main ()
      

      char *foo = 0; *foo = 1;

      $ gdb test core
      Program terminated with signal 11, Segmentation fault.
      warning: Wrong size fpregset in core file.
      #0  0x00000000 in ?? ()
      (gdb) bt
      #0  0x00000000 in ?? ()
      #1  0x0108565b in __libc_start_main (main=0x8048464 <main>, argc=1, ubp_av=0x1023e64, 
          init=0x8048490 <__libc_csu_init>, fini=0x8048480 <__libc_csu_fini>, rtld_fini=0xea20 <_dl_fini>, 
          stack_end=0x1023e5c) at libc-start.c:251
      #2  0x080483d1 in _start ()
      

      raise (SIGABRT);

      $ gdb a.out core
      warning: core file may not match specified executable file.
      [New Thread 76651]
      
      
      warning: Wrong size fpregset in core file.
      Reading symbols from /lib/libc.so.0.3...[...]
      Core was generated by `./a.out'.
      Program terminated with signal 6, Aborted.
      
      
      warning: Wrong size fpregset in core file.
      #0  0x00000000 in ?? ()
      (gdb) bt
      #0  0x00000000 in ?? ()
      Cannot access memory at address 0x17
      

      Probably GDB doesn't manage to dig in the stack properly.

  • ... against crash-suspend will...

    • ... not work at all:

      $ CRASHSERVER=/servers/crash-suspend ./a.out
      $ [returns to the shell and doesn't suspended]
      
    • ... show big variances in running-time behavior:

      $ TIMEFORMAT='real %R user %U system %S'
      $ rm -f core; time env CRASHSERVER=/servers/crash-suspend ./a.out; ls -l core
      Aborted (core dumped)
      real 1.381 user 0.000 system 0.010
      -rw------- 1 tschwinge tschwinge 17031168 Jul  7 22:04 core
      $ rm -f core; time env CRASHSERVER=/servers/crash-suspend ./a.out; ls -l core
      Aborted (core dumped)
      real 1.332 user 0.000 system 0.010
      -rw------- 1 tschwinge tschwinge 17031168 Jul  7 22:04 core
      $ rm -f core; time env CRASHSERVER=/servers/crash-suspend ./a.out; ls -l core
      Aborted (core dumped)
      real 21.228 user 0.000 system 0.010
      -rw------- 1 tschwinge tschwinge 17031168 Jul  7 22:04 core
      $ rm -f core; time env CRASHSERVER=/servers/crash-suspend ./a.out; ls -l core
      Aborted (core dumped)
      real 1.323 user 0.000 system 0.010
      -rw------- 1 tschwinge tschwinge 17031168 Jul  7 22:05 core
      $ rm -f core; time env CRASHSERVER=/servers/crash-suspend ./a.out; ls -l core
      Aborted (core dumped)
      real 22.279 user 0.000 system 0.010
      -rw------- 1 tschwinge tschwinge 17031168 Jul  7 22:05 core
      $ rm -f core; time env CRASHSERVER=/servers/crash-suspend ./a.out; ls -l core
      Aborted (core dumped)
      real 1.362 user 0.000 system 0.000
      -rw------- 1 tschwinge tschwinge 17031168 Jul  7 22:08 core
      $ rm -f core; time env CRASHSERVER=/servers/crash-suspend ./a.out; ls -l core
      Aborted (core dumped)
      real 21.110 user 0.000 system 0.000
      -rw------- 1 tschwinge tschwinge 17031168 Jul  7 22:08 core
      $ rm -f core; time env CRASHSERVER=/servers/crash-suspend ./a.out; ls -l core
      Aborted (core dumped)
      real 1.350 user 0.000 system 0.020
      -rw------- 1 tschwinge tschwinge 17031168 Jul  7 22:08 core
      
    • ... can reliably crash GNU Mach:

      This happens if a core file is already present (and won't get overwritten; see above). I reproduced this three times.

      $ TIMEFORMAT='real %R user %U system %S'
      $ time env CRASHSERVER=/servers/crash-suspend ./a.out; ls -l core
      Aborted
      real 2.856 user 0.000 system 0.010
      -rw------- 1 tschwinge tschwinge 17031168 Jul  7 22:08 core
      
      
      panic: zalloc: zone kalloc.8192 exhausted
      Kernel Breakpoint trap, eip 0x20020a77
      Stopped at  0x20020a76: int     $3
      db> trace
      0x20020a76(2006aba8,4d0f7e9c,200209b0,0,0)
      0x20020a4d(2006b094,2006ae40,2000,20016803,4a5f4114)
      0x2002bca5(49a03564,1,0,9,1000)
      0x20022f4c(2000,4a5f45d4,4a84879c,49a46564,4ac43e78)
      0x20021e65(4ac43e78,4a5f45d4,4a5f4114,0,0)
      0x2005309d(2106ba9c,3,38,28,1783)
      Bad frame pointer: 0x2106ba78
      
      
      $ addr2line -i -f -e /boot/gnumach-xen 0x20020a76 0x20020a4d 0x2002bca5 0x20022f4c 0x20021e65 0x2005309d
      Debugger
      /home/tschwinge/tmp/gnumach/gnumach-1-branch-Xen-branch.build/../gnumach-1-branch-Xen-branch/kern/debug.c:105
      panic
      /home/tschwinge/tmp/gnumach/gnumach-1-branch-Xen-branch.build/../gnumach-1-branch-Xen-branch/kern/debug.c:148
      zalloc
      /home/tschwinge/tmp/gnumach/gnumach-1-branch-Xen-branch.build/../gnumach-1-branch-Xen-branch/kern/zalloc.c:470
      kalloc
      /home/tschwinge/tmp/gnumach/gnumach-1-branch-Xen-branch.build/../gnumach-1-branch-Xen-branch/kern/kalloc.c:185
      ipc_kobject_server
      /home/tschwinge/tmp/gnumach/gnumach-1-branch-Xen-branch.build/../gnumach-1-branch-Xen-branch/kern/ipc_kobject.c:76
      mach_msg_trap
      /home/tschwinge/tmp/gnumach/gnumach-1-branch-Xen-branch.build/../gnumach-1-branch-Xen-branch/ipc/mach_msg.c:1367
      

IRC, freenode, #hurd, 2013-09-07

<rekado> I'm trying to investigate a crash in pfinet, so it will actually
  die.  I just want to know why it dies and what the value of a few
  variables has been when it died.
<teythoon> have you tried to make it dump core?
<rekado> oh, good idea.
<rekado> I'll try that.
<teythoon> do you know how?
<rekado> I don't, but I think I can figure it out.
<teythoon> look into /servers
<rekado> do I just have to set CRASHSERVER=/servers/crash-dump-core and run
  pfinet in that environment?
<teythoon> possibly, I've never heard of CRASHSERVER, but it's certainly
  plausible ;)
<teythoon> I just link crash to crash-dump-core, that way it is permanent
  and for all processes
<rekado> found it in the website contents
<rekado> gotta try that.
<rekado> hmm, I can't get pfinet to dump core; linked /servers/crash to
  /servers/crash-dump-core and compiled pfinet to raise(6) at one point.
<rekado> But no core file is created.
<teythoon> :/
<teythoon> rekado: try cd /tmp ; cat & kill -SIGILL %% to see if that dumps
  core
<rekado> yes, this works.
<rekado> I replaced the original pfinet with my crashing version.
<rekado> Should it dump core to /hurd then?
<teythoon> I'm not sure about it's wd
<teythoon> hm, ok, I just did settrans -ca foo /hurd/pfinet and then killed
  that pfient with SIGILL and it dumped core
<teythoon> to the directory I issued the settrans from
<rekado> So I must run it myself.  I can't just replace the original binary
  and have it dump core somewhere.
<teythoon> it seems that you have to use settrans -ca to start an active
  translator
<teythoon> do fsysopts /servers/socket/2 to find out the cmdline of your
  pfinet
<rekado> that's very helpful.
<rekado> thanks
<teythoon> then use this to restart it, e.g.:
<teythoon> settrans -afg /servers/socket/2 $(fsysopts /servers/socket/2)
<teythoon> if it dies it should dump core to you cwd
<rekado> great. Thank you very much.  I had been wondering how to get the
  full cmdline of pfinet.
* rekado makes a note of fsysopts
<rekado> yup, there's the core file. Nice.
<teythoon> cool 8D
<teythoon> btw, in case using gdb doesn't work out for your problem, if you
  start pfinet (or any translator) this way (with -a == active), you can
  write stuff to stderr
<rekado> yeah, I noticed that.  The assert() call wrote to stderr.  Useful.
<braunr> rekado: core dumps are another not-working-well feature of the
  hurd :/
<braunr> i recommend attaching
<tschwinge> rekado: In case that's still helpful:
  <http://www.gnu.org/software/hurd/hurd/debugging/translator.html>.

IRC, freenode, #hurd, 2013-12-14

<gnu_srs> How to get a core dump?
<teythoon> either set CRASHSERVER to /servers/crash-dump-core for the
  process you want the core file of
<teythoon> or make /servers/crash point to crash-dump-core to make this the
  default for all processes
<gnu_srs> does it work now, it did not before?
<teythoon> it does for me, never had issues
<gnu_srs> k!
<teythoon> well, i believe the second option has issues
<teythoon> if two processes crash, both may write/create a file in the same
  location

If someone is working in this area, they may want to have a look at GDB gcore, and port http://code.google.com/p/google-coredumper/, too.