First time here? Check out the FAQ!

Ask Your Question
1

SIGILL in forked process

asked 11 years ago

updated 11 years ago

I am playing around with fork. I have a very simple test case which is basically like this:

def fork_test():
    import os
    pid = os.fork()
    if pid != 0:
        print "parent, child: %i" % pid
        os.waitpid(pid, 0)
    else:
        print "child"
        try:
            # some dummy matrix calculation
        finally:
            os._exit(0)

(See _fork_test_func() below for some sample matrix calculations.)

And I'm getting:

------------------------------------------------------------------------
Unhandled SIGILL: An illegal instruction occurred in Sage.
This probably occurred because a *compiled* component of Sage has a bug
in it and is not properly wrapped with sig_on(), sig_off(). You might
want to run Sage under gdb with 'sage -gdb' to debug this.
Sage will now terminate.
------------------------------------------------------------------------

With this (incomplete) backtrace:

Crashed Thread:  0  Dispatch queue: com.apple.root.default-priority

Exception Type:  EXC_BAD_INSTRUCTION (SIGILL)
Exception Codes: 0x0000000000000001, 0x0000000000000000

Application Specific Information:
BUG IN LIBDISPATCH: flawed group/semaphore logic

Thread 0 Crashed:: Dispatch queue: com.apple.root.default-priority
0   libsystem_kernel.dylib          0x00007fff8c6d1d46 __kill + 10
1   libcsage.dylib                  0x0000000101717f33 sigdie + 124
2   libcsage.dylib                  0x0000000101717719 sage_signal_handler + 364
3   libsystem_c.dylib               0x00007fff86b1094a _sigtramp + 26
4   libdispatch.dylib               0x00007fff89a66c74 _dispatch_thread_semaphore_signal + 27
5   libdispatch.dylib               0x00007fff89a66f3e _dispatch_apply2 + 143
6   libdispatch.dylib               0x00007fff89a66e30 dispatch_apply_f + 440
7   libBLAS.dylib                   0x00007fff906ca435 APL_dtrsm + 1963
8   libBLAS.dylib                   0x00007fff906702b6 cblas_dtrsm + 882
9   matrix_modn_dense_double.so     0x0000000108612615 void FFLAS::Protected::ftrsmRightLowerNoTransUnit<double>::delayed<FFPACK::Modular<double> >(FFPACK::Modular<double> const&, unsigned long, unsigned long, FFPACK::Modular<double>::Element*, unsigned long, FFPACK::Modular<double>::Element*, unsigned long, unsigned long, unsigned long) + 2853
10  matrix_modn_dense_double.so     0x0000000108611daa void FFLAS::Protected::ftrsmRightLowerNoTransUnit<double>::delayed<FFPACK::Modular<double> >(FFPACK::Modular<double> const&, unsigned long, unsigned long, FFPACK::Modular<double>::Element*, unsigned long, FFPACK::Modular<double>::Element*, unsigned long, unsigned long, unsigned long) + 698
11  matrix_modn_dense_double.so     0x0000000108612ccf void FFLAS::Protected::ftrsmRightLowerNoTransUnit<double>::operator()<FFPACK::Modular<double> >(FFPACK::Modular<double> const&, unsigned long, unsigned long, FFPACK::Modular<double>::Element*, unsigned long, FFPACK::Modular<double>::Element*, unsigned long) + 831
12  ???                             0x00007f99e481a028 0 + 140298940424232

Thread 1:
0   libsystem_kernel.dylib          0x00007fff8c6d26d6 __workq_kernreturn + 10
1   libsystem_c.dylib               0x00007fff86b24f4c _pthread_workq_return + 25
2   libsystem_c.dylib               0x00007fff86b24d13 _pthread_wqthread + 412
3   libsystem_c.dylib               0x00007fff86b0f1d1 start_wqthread + 13

Thread 2:
0   libsystem_kernel.dylib          0x00007fff8c6d26d6 __workq_kernreturn + 10
1   libsystem_c.dylib               0x00007fff86b24f4c _pthread_workq_return + 25
2   libsystem_c.dylib               0x00007fff86b24d13 _pthread_wqthread + 412
3   libsystem_c.dylib               0x00007fff86b0f1d1 start_wqthread + 13

Thread 0 crashed with X86 Thread State (64-bit):
  rax: 0x0000000000000000  rbx: 0x00007fff5ec8e418  rcx: 0x00007fff5ec8df28  rdx: 0x0000000000000000
  rdi: 0x000000000000b8f7  rsi: 0x0000000000000004  rbp: 0x00007fff5ec8df40  rsp: 0x00007fff5ec8df28
   r8: 0x00007fff5ec8e418   r9: 0x0000000000000000  r10: 0x000000000000000a  r11: 0x0000000000000202
  r12: 0x00007f99ea500de0  r13: 0x0000000000000003  r14: 0x00007fff5ec8e860  r15: 0x00007fff906ca447
  rip: 0x00007fff8c6d1d46  rfl: 0x0000000000000202  cr2: 0x00007fff74a29848
Logical CPU: 0

Is there something special I need to do after a fork? I looked up the fork decorator of Sage and it looks like it basically does the same.

The crash also happens with the fork decorator of Sage itself. Another test case:

def fork_test2():
    def test():
        # do some stuff
    from sage.parallel.decorate import fork
    test_ = fork(test, verbose=True)
    test_()

Even simpler test case:

def _fork_test_func():
    while True:
        m = matrix(QQ, 100, [randrange(-100,100) for i in range(100*100 ...
(more)
Preview: (hide)

2 Answers

Sort by » oldest newest most voted
1

answered 11 years ago

Volker Braun gravatar image

Matrix operations generally rely on instruction set additions in modern CPUS (SSE3/SSE4/SSE4.1/AVX). If you get a SIGILL in matrix code then that almost always means that the code was compiled for a more modern CPU. The easiest solution would be to compile from source, then everything will be tailored to your CPU.

Preview: (hide)
link

Comments

It only happens in the forked process. Matrix multiplications work fine otherwise.

Albert Zeyer gravatar imageAlbert Zeyer ( 11 years ago )
0

answered 11 years ago

tmonteil gravatar image

updated 11 years ago

I tried and i cannot reproduce it on GNU/Linux x86_64. I got (without error):

sage: fork_test()
parent, child: 5167
child
2

(i replaced the "dummy matrix calculation" by print 1+1).

Hence, there might be a problem with your build (and more precisely the python binary). Did you download the binary corresponding to your OS ? Which one (architecture, OS, Sage binary) ? Did you compile Sage by yourself ? Which version ? Does it work if you run the first script from a python shell on the same machine (type python from the command line, not sage nor sage -python) ?

Preview: (hide)
link

Comments

I also tried to use pure Python functions. It does not crash it that case for me neither. But see my `_fork_test_func` and try with that one. I downloaded the MacOSX binaries. That is Sage 5.8.

Albert Zeyer gravatar imageAlbert Zeyer ( 11 years ago )

I tried all your functions, no bug at all (apart from the fact that your `_fork_test_func()` function will never stop). I am not sure to understand, if you replace the "dummy matrix calculation" in your first example by `print 1+1`, do you got an error ?

tmonteil gravatar imagetmonteil ( 11 years ago )

No, I don't get an error with `print 1+1`. I think it only happens when I call other more complicated functions from Sage. I am just playing more around and I just figured out that the crash also doesn't happen in a fresh session with my other example. So some internal state caused this. I try to work on a test case which also works in a fresh Sage session.

Albert Zeyer gravatar imageAlbert Zeyer ( 11 years ago )

It happens if you called `_fork_test_func()` also in the parent process before the fork. I slightly extended my last simple test case. With that test case, it also crashes in a fresh Sage session.

Albert Zeyer gravatar imageAlbert Zeyer ( 11 years ago )

Still no problem on my side. If, instead of m.right_kernel() you type print m.right_kernel() how many right output do you get ?

tmonteil gravatar imagetmonteil ( 11 years ago )

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

Stats

Asked: 11 years ago

Seen: 965 times

Last updated: Jun 02 '13