Ask Your Question
0

SIGILL in forked process

asked 2013-05-30 21:57:19 -0500

updated 2013-05-31 01:22:48 -0500

I am playing around with fork. I have a very simple test case which is basically like this:

def fork_test():
    import os
    pid = os.fork()
    if pid != 0:
        print "parent, child: %i" % pid
        os.waitpid(pid, 0)
    else:
        print "child"
        try:
            # some dummy matrix calculation
        finally:
            os._exit(0)

(See _fork_test_func() below for some sample matrix calculations.)

And I'm getting:

------------------------------------------------------------------------
Unhandled SIGILL: An illegal instruction occurred in Sage.
This probably occurred because a *compiled* component of Sage has a bug
in it and is not properly wrapped with sig_on(), sig_off(). You might
want to run Sage under gdb with 'sage -gdb' to debug this.
Sage will now terminate.
------------------------------------------------------------------------

With this (incomplete) backtrace:

Crashed Thread:  0  Dispatch queue: com.apple.root.default-priority

Exception Type:  EXC_BAD_INSTRUCTION (SIGILL)
Exception Codes: 0x0000000000000001, 0x0000000000000000

Application Specific Information:
BUG IN LIBDISPATCH: flawed group/semaphore logic

Thread 0 Crashed:: Dispatch queue: com.apple.root.default-priority
0   libsystem_kernel.dylib          0x00007fff8c6d1d46 __kill + 10
1   libcsage.dylib                  0x0000000101717f33 sigdie + 124
2   libcsage.dylib                  0x0000000101717719 sage_signal_handler + 364
3   libsystem_c.dylib               0x00007fff86b1094a _sigtramp + 26
4   libdispatch.dylib               0x00007fff89a66c74 _dispatch_thread_semaphore_signal + 27
5   libdispatch.dylib               0x00007fff89a66f3e _dispatch_apply2 + 143
6   libdispatch.dylib               0x00007fff89a66e30 dispatch_apply_f + 440
7   libBLAS.dylib                   0x00007fff906ca435 APL_dtrsm + 1963
8   libBLAS.dylib                   0x00007fff906702b6 cblas_dtrsm + 882
9   matrix_modn_dense_double.so     0x0000000108612615 void FFLAS::Protected::ftrsmRightLowerNoTransUnit<double>::delayed<FFPACK::Modular<double> >(FFPACK::Modular<double> const&, unsigned long, unsigned long, FFPACK::Modular<double>::Element*, unsigned long, FFPACK::Modular<double>::Element*, unsigned long, unsigned long, unsigned long) + 2853
10  matrix_modn_dense_double.so     0x0000000108611daa void FFLAS::Protected::ftrsmRightLowerNoTransUnit<double>::delayed<FFPACK::Modular<double> >(FFPACK::Modular<double> const&, unsigned long, unsigned long, FFPACK::Modular<double>::Element*, unsigned long, FFPACK::Modular<double>::Element*, unsigned long, unsigned long, unsigned long) + 698
11  matrix_modn_dense_double.so     0x0000000108612ccf void FFLAS::Protected::ftrsmRightLowerNoTransUnit<double>::operator()<FFPACK::Modular<double> >(FFPACK::Modular<double> const&, unsigned long, unsigned long, FFPACK::Modular<double>::Element*, unsigned long, FFPACK::Modular<double>::Element*, unsigned long) + 831
12  ???                             0x00007f99e481a028 0 + 140298940424232

Thread 1:
0   libsystem_kernel.dylib          0x00007fff8c6d26d6 __workq_kernreturn + 10
1   libsystem_c.dylib               0x00007fff86b24f4c _pthread_workq_return + 25
2   libsystem_c.dylib               0x00007fff86b24d13 _pthread_wqthread + 412
3   libsystem_c.dylib               0x00007fff86b0f1d1 start_wqthread + 13

Thread 2:
0   libsystem_kernel.dylib          0x00007fff8c6d26d6 __workq_kernreturn + 10
1   libsystem_c.dylib               0x00007fff86b24f4c _pthread_workq_return + 25
2   libsystem_c.dylib               0x00007fff86b24d13 _pthread_wqthread + 412
3   libsystem_c.dylib               0x00007fff86b0f1d1 start_wqthread + 13

Thread 0 crashed with X86 Thread State (64-bit):
  rax: 0x0000000000000000  rbx: 0x00007fff5ec8e418  rcx: 0x00007fff5ec8df28  rdx: 0x0000000000000000
  rdi: 0x000000000000b8f7  rsi: 0x0000000000000004  rbp: 0x00007fff5ec8df40  rsp: 0x00007fff5ec8df28
   r8: 0x00007fff5ec8e418   r9: 0x0000000000000000  r10: 0x000000000000000a  r11: 0x0000000000000202
  r12: 0x00007f99ea500de0  r13: 0x0000000000000003  r14: 0x00007fff5ec8e860  r15: 0x00007fff906ca447
  rip: 0x00007fff8c6d1d46  rfl: 0x0000000000000202  cr2: 0x00007fff74a29848
Logical CPU: 0

Is there something special I need to do after a fork? I looked up the fork decorator of Sage and it looks like it basically does the same.

The crash also happens with the fork decorator of Sage itself. Another test case:

def fork_test2():
    def test():
        # do some stuff
    from sage.parallel.decorate import fork
    test_ = fork(test, verbose=True)
    test_()

Even simpler test case:

def _fork_test_func():
    while True:
        m = matrix(QQ, 100, [randrange(-100,100) for i in range(100*100 ...
(more)
edit retag flag offensive close merge delete

2 answers

Sort by ยป oldest newest most voted
0

answered 2013-05-30 23:40:56 -0500

tmonteil gravatar image

updated 2013-05-30 23:42:56 -0500

I tried and i cannot reproduce it on GNU/Linux x86_64. I got (without error):

sage: fork_test()
parent, child: 5167
child
2

(i replaced the "dummy matrix calculation" by print 1+1).

Hence, there might be a problem with your build (and more precisely the python binary). Did you download the binary corresponding to your OS ? Which one (architecture, OS, Sage binary) ? Did you compile Sage by yourself ? Which version ? Does it work if you run the first script from a python shell on the same machine (type python from the command line, not sage nor sage -python) ?

edit flag offensive delete link more

Comments

I also tried to use pure Python functions. It does not crash it that case for me neither. But see my `_fork_test_func` and try with that one. I downloaded the MacOSX binaries. That is Sage 5.8.

Albert Zeyer gravatar imageAlbert Zeyer ( 2013-05-30 23:54:19 -0500 )edit

I tried all your functions, no bug at all (apart from the fact that your `_fork_test_func()` function will never stop). I am not sure to understand, if you replace the "dummy matrix calculation" in your first example by `print 1+1`, do you got an error ?

tmonteil gravatar imagetmonteil ( 2013-05-31 00:42:02 -0500 )edit

No, I don't get an error with `print 1+1`. I think it only happens when I call other more complicated functions from Sage. I am just playing more around and I just figured out that the crash also doesn't happen in a fresh session with my other example. So some internal state caused this. I try to work on a test case which also works in a fresh Sage session.

Albert Zeyer gravatar imageAlbert Zeyer ( 2013-05-31 01:16:49 -0500 )edit

It happens if you called `_fork_test_func()` also in the parent process before the fork. I slightly extended my last simple test case. With that test case, it also crashes in a fresh Sage session.

Albert Zeyer gravatar imageAlbert Zeyer ( 2013-05-31 01:23:54 -0500 )edit

Still no problem on my side. If, instead of m.right_kernel() you type print m.right_kernel() how many right output do you get ?

tmonteil gravatar imagetmonteil ( 2013-05-31 01:29:10 -0500 )edit
0

answered 2013-06-02 11:10:38 -0500

Volker Braun gravatar image

Matrix operations generally rely on instruction set additions in modern CPUS (SSE3/SSE4/SSE4.1/AVX). If you get a SIGILL in matrix code then that almost always means that the code was compiled for a more modern CPU. The easiest solution would be to compile from source, then everything will be tailored to your CPU.

edit flag offensive delete link more

Comments

It only happens in the forked process. Matrix multiplications work fine otherwise.

Albert Zeyer gravatar imageAlbert Zeyer ( 2013-06-03 06:27:39 -0500 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

Stats

Asked: 2013-05-30 21:57:19 -0500

Seen: 338 times

Last updated: Jun 02 '13