Ask Your Question

Revision history [back]

SIGILL in forked process

I am playing around with fork. I have a very simple test case which is basically like this:

def fork_test():
    import os
    pid = os.fork()
    if pid != 0:
        print "parent, child: %i" % pid
        os.waitpid(pid, 0)
    else:
        print "child"
        try:
            # some dummy matrix calculation
        finally:
            os._exit(0)

And I'm getting:

------------------------------------------------------------------------
Unhandled SIGILL: An illegal instruction occurred in Sage.
This probably occurred because a *compiled* component of Sage has a bug
in it and is not properly wrapped with sig_on(), sig_off(). You might
want to run Sage under gdb with 'sage -gdb' to debug this.
Sage will now terminate.
------------------------------------------------------------------------

With this (incomplete) backtrace:

Crashed Thread:  0  Dispatch queue: com.apple.root.default-priority

Exception Type:  EXC_BAD_INSTRUCTION (SIGILL)
Exception Codes: 0x0000000000000001, 0x0000000000000000

Application Specific Information:
BUG IN LIBDISPATCH: flawed group/semaphore logic

Thread 0 Crashed:: Dispatch queue: com.apple.root.default-priority
0   libsystem_kernel.dylib          0x00007fff8c6d1d46 __kill + 10
1   libcsage.dylib                  0x0000000101717f33 sigdie + 124
2   libcsage.dylib                  0x0000000101717719 sage_signal_handler + 364
3   libsystem_c.dylib               0x00007fff86b1094a _sigtramp + 26
4   libdispatch.dylib               0x00007fff89a66c74 _dispatch_thread_semaphore_signal + 27
5   libdispatch.dylib               0x00007fff89a66f3e _dispatch_apply2 + 143
6   libdispatch.dylib               0x00007fff89a66e30 dispatch_apply_f + 440
7   libBLAS.dylib                   0x00007fff906ca435 APL_dtrsm + 1963
8   libBLAS.dylib                   0x00007fff906702b6 cblas_dtrsm + 882
9   matrix_modn_dense_double.so     0x0000000108612615 void FFLAS::Protected::ftrsmRightLowerNoTransUnit<double>::delayed<FFPACK::Modular<double> >(FFPACK::Modular<double> const&, unsigned long, unsigned long, FFPACK::Modular<double>::Element*, unsigned long, FFPACK::Modular<double>::Element*, unsigned long, unsigned long, unsigned long) + 2853
10  matrix_modn_dense_double.so     0x0000000108611daa void FFLAS::Protected::ftrsmRightLowerNoTransUnit<double>::delayed<FFPACK::Modular<double> >(FFPACK::Modular<double> const&, unsigned long, unsigned long, FFPACK::Modular<double>::Element*, unsigned long, FFPACK::Modular<double>::Element*, unsigned long, unsigned long, unsigned long) + 698
11  matrix_modn_dense_double.so     0x0000000108612ccf void FFLAS::Protected::ftrsmRightLowerNoTransUnit<double>::operator()<FFPACK::Modular<double> >(FFPACK::Modular<double> const&, unsigned long, unsigned long, FFPACK::Modular<double>::Element*, unsigned long, FFPACK::Modular<double>::Element*, unsigned long) + 831
12  ???                             0x00007f99e481a028 0 + 140298940424232

Thread 1:
0   libsystem_kernel.dylib          0x00007fff8c6d26d6 __workq_kernreturn + 10
1   libsystem_c.dylib               0x00007fff86b24f4c _pthread_workq_return + 25
2   libsystem_c.dylib               0x00007fff86b24d13 _pthread_wqthread + 412
3   libsystem_c.dylib               0x00007fff86b0f1d1 start_wqthread + 13

Thread 2:
0   libsystem_kernel.dylib          0x00007fff8c6d26d6 __workq_kernreturn + 10
1   libsystem_c.dylib               0x00007fff86b24f4c _pthread_workq_return + 25
2   libsystem_c.dylib               0x00007fff86b24d13 _pthread_wqthread + 412
3   libsystem_c.dylib               0x00007fff86b0f1d1 start_wqthread + 13

Thread 0 crashed with X86 Thread State (64-bit):
  rax: 0x0000000000000000  rbx: 0x00007fff5ec8e418  rcx: 0x00007fff5ec8df28  rdx: 0x0000000000000000
  rdi: 0x000000000000b8f7  rsi: 0x0000000000000004  rbp: 0x00007fff5ec8df40  rsp: 0x00007fff5ec8df28
   r8: 0x00007fff5ec8e418   r9: 0x0000000000000000  r10: 0x000000000000000a  r11: 0x0000000000000202
  r12: 0x00007f99ea500de0  r13: 0x0000000000000003  r14: 0x00007fff5ec8e860  r15: 0x00007fff906ca447
  rip: 0x00007fff8c6d1d46  rfl: 0x0000000000000202  cr2: 0x00007fff74a29848
Logical CPU: 0

Is there something special I need to do after a fork? I looked up the fork decorator of Sage and it looks like it basically does the same.

The crash also happens with the fork decorator of Sage itself. Another test case:

def fork_test2():
    def test():
        # do some stuff
    from sage.parallel.decorate import fork
    test_ = fork(test, verbose=True)
    test_()

SIGILL in forked process

I am playing around with fork. I have a very simple test case which is basically like this:

def fork_test():
    import os
    pid = os.fork()
    if pid != 0:
        print "parent, child: %i" % pid
        os.waitpid(pid, 0)
    else:
        print "child"
        try:
            # some dummy matrix calculation
        finally:
            os._exit(0)

And I'm getting:

------------------------------------------------------------------------
Unhandled SIGILL: An illegal instruction occurred in Sage.
This probably occurred because a *compiled* component of Sage has a bug
in it and is not properly wrapped with sig_on(), sig_off(). You might
want to run Sage under gdb with 'sage -gdb' to debug this.
Sage will now terminate.
------------------------------------------------------------------------

With this (incomplete) backtrace:

Crashed Thread:  0  Dispatch queue: com.apple.root.default-priority

Exception Type:  EXC_BAD_INSTRUCTION (SIGILL)
Exception Codes: 0x0000000000000001, 0x0000000000000000

Application Specific Information:
BUG IN LIBDISPATCH: flawed group/semaphore logic

Thread 0 Crashed:: Dispatch queue: com.apple.root.default-priority
0   libsystem_kernel.dylib          0x00007fff8c6d1d46 __kill + 10
1   libcsage.dylib                  0x0000000101717f33 sigdie + 124
2   libcsage.dylib                  0x0000000101717719 sage_signal_handler + 364
3   libsystem_c.dylib               0x00007fff86b1094a _sigtramp + 26
4   libdispatch.dylib               0x00007fff89a66c74 _dispatch_thread_semaphore_signal + 27
5   libdispatch.dylib               0x00007fff89a66f3e _dispatch_apply2 + 143
6   libdispatch.dylib               0x00007fff89a66e30 dispatch_apply_f + 440
7   libBLAS.dylib                   0x00007fff906ca435 APL_dtrsm + 1963
8   libBLAS.dylib                   0x00007fff906702b6 cblas_dtrsm + 882
9   matrix_modn_dense_double.so     0x0000000108612615 void FFLAS::Protected::ftrsmRightLowerNoTransUnit<double>::delayed<FFPACK::Modular<double> >(FFPACK::Modular<double> const&, unsigned long, unsigned long, FFPACK::Modular<double>::Element*, unsigned long, FFPACK::Modular<double>::Element*, unsigned long, unsigned long, unsigned long) + 2853
10  matrix_modn_dense_double.so     0x0000000108611daa void FFLAS::Protected::ftrsmRightLowerNoTransUnit<double>::delayed<FFPACK::Modular<double> >(FFPACK::Modular<double> const&, unsigned long, unsigned long, FFPACK::Modular<double>::Element*, unsigned long, FFPACK::Modular<double>::Element*, unsigned long, unsigned long, unsigned long) + 698
11  matrix_modn_dense_double.so     0x0000000108612ccf void FFLAS::Protected::ftrsmRightLowerNoTransUnit<double>::operator()<FFPACK::Modular<double> >(FFPACK::Modular<double> const&, unsigned long, unsigned long, FFPACK::Modular<double>::Element*, unsigned long, FFPACK::Modular<double>::Element*, unsigned long) + 831
12  ???                             0x00007f99e481a028 0 + 140298940424232

Thread 1:
0   libsystem_kernel.dylib          0x00007fff8c6d26d6 __workq_kernreturn + 10
1   libsystem_c.dylib               0x00007fff86b24f4c _pthread_workq_return + 25
2   libsystem_c.dylib               0x00007fff86b24d13 _pthread_wqthread + 412
3   libsystem_c.dylib               0x00007fff86b0f1d1 start_wqthread + 13

Thread 2:
0   libsystem_kernel.dylib          0x00007fff8c6d26d6 __workq_kernreturn + 10
1   libsystem_c.dylib               0x00007fff86b24f4c _pthread_workq_return + 25
2   libsystem_c.dylib               0x00007fff86b24d13 _pthread_wqthread + 412
3   libsystem_c.dylib               0x00007fff86b0f1d1 start_wqthread + 13

Thread 0 crashed with X86 Thread State (64-bit):
  rax: 0x0000000000000000  rbx: 0x00007fff5ec8e418  rcx: 0x00007fff5ec8df28  rdx: 0x0000000000000000
  rdi: 0x000000000000b8f7  rsi: 0x0000000000000004  rbp: 0x00007fff5ec8df40  rsp: 0x00007fff5ec8df28
   r8: 0x00007fff5ec8e418   r9: 0x0000000000000000  r10: 0x000000000000000a  r11: 0x0000000000000202
  r12: 0x00007f99ea500de0  r13: 0x0000000000000003  r14: 0x00007fff5ec8e860  r15: 0x00007fff906ca447
  rip: 0x00007fff8c6d1d46  rfl: 0x0000000000000202  cr2: 0x00007fff74a29848
Logical CPU: 0

Is there something special I need to do after a fork? I looked up the fork decorator of Sage and it looks like it basically does the same.

The crash also happens with the fork decorator of Sage itself. Another test case:

def fork_test2():
    def test():
        # do some stuff
    from sage.parallel.decorate import fork
    test_ = fork(test, verbose=True)
    test_()

Even simpler test case:

def _fork_test_func():
    while True:
        m = matrix(QQ, 100, [randrange(-100,100) for i in range(100*100)])
        m.right_kernel()

def fork_test():
    import os
    pid = os.fork()
    if pid != 0:
        print "parent, child: %i" % pid
        os.waitpid(pid, 0)
    else:
        print "child"
        try:
            _fork_test_func()
        finally:
            os._exit(0)

Results in a slightly different crash:

python(48672) malloc: *** error for object 0x11185f000: pointer being freed already on death-row
*** set a breakpoint in malloc_error_break to debug

With backtrace:

Crashed Thread:  1  Dispatch queue: com.apple.root.default-priority

Exception Type:  EXC_CRASH (SIGABRT)
Exception Codes: 0x0000000000000000, 0x0000000000000000

Application Specific Information:
*** error for object 0x11185f000: pointer being freed already on death-row


Thread 0:: Dispatch queue: com.apple.main-thread
0   matrix2.so                      0x0000000107fa403f __pyx_pw_4sage_6matrix_7matrix2_6Matrix_71right_kernel_matrix + 27551
1   ???                             0x000000000000000d 0 + 13

Thread 1 Crashed:: Dispatch queue: com.apple.root.default-priority
0   libsystem_kernel.dylib          0x00007fff8c6d239a __semwait_signal_nocancel + 10
1   libsystem_c.dylib               0x00007fff86b17e1b nanosleep$NOCANCEL + 138
2   libsystem_c.dylib               0x00007fff86b7b9a8 usleep$NOCANCEL + 54
3   libsystem_c.dylib               0x00007fff86b67eca __abort + 203
4   libsystem_c.dylib               0x00007fff86b67dff abort + 192
5   libsystem_c.dylib               0x00007fff86b43905 szone_error + 580
6   libsystem_c.dylib               0x00007fff86b43f7d free_large + 229
7   libsystem_c.dylib               0x00007fff86b3b8f8 free + 199
8   libBLAS.dylib                   0x00007fff906b0431 __APL_dgemm_block_invoke_0 + 132
9   libdispatch.dylib               0x00007fff89a65f01 _dispatch_call_block_and_release + 15
10  libdispatch.dylib               0x00007fff89a620b6 _dispatch_client_callout + 8
11  libdispatch.dylib               0x00007fff89a631fa _dispatch_worker_thread2 + 304
12  libsystem_c.dylib               0x00007fff86b24d0b _pthread_wqthread + 404
13  libsystem_c.dylib               0x00007fff86b0f1d1 start_wqthread + 13

The same happens also for this:

def fork_test2():
    from sage.parallel.decorate import fork
    test_ = fork(_fork_test_func, verbose=True)
    test_()

SIGILL in forked process

I am playing around with fork. I have a very simple test case which is basically like this:

def fork_test():
    import os
    pid = os.fork()
    if pid != 0:
        print "parent, child: %i" % pid
        os.waitpid(pid, 0)
    else:
        print "child"
        try:
            # some dummy matrix calculation
        finally:
            os._exit(0)

(See _fork_test_func() below for some sample matrix calculations.)

And I'm getting:

------------------------------------------------------------------------
Unhandled SIGILL: An illegal instruction occurred in Sage.
This probably occurred because a *compiled* component of Sage has a bug
in it and is not properly wrapped with sig_on(), sig_off(). You might
want to run Sage under gdb with 'sage -gdb' to debug this.
Sage will now terminate.
------------------------------------------------------------------------

With this (incomplete) backtrace:

Crashed Thread:  0  Dispatch queue: com.apple.root.default-priority

Exception Type:  EXC_BAD_INSTRUCTION (SIGILL)
Exception Codes: 0x0000000000000001, 0x0000000000000000

Application Specific Information:
BUG IN LIBDISPATCH: flawed group/semaphore logic

Thread 0 Crashed:: Dispatch queue: com.apple.root.default-priority
0   libsystem_kernel.dylib          0x00007fff8c6d1d46 __kill + 10
1   libcsage.dylib                  0x0000000101717f33 sigdie + 124
2   libcsage.dylib                  0x0000000101717719 sage_signal_handler + 364
3   libsystem_c.dylib               0x00007fff86b1094a _sigtramp + 26
4   libdispatch.dylib               0x00007fff89a66c74 _dispatch_thread_semaphore_signal + 27
5   libdispatch.dylib               0x00007fff89a66f3e _dispatch_apply2 + 143
6   libdispatch.dylib               0x00007fff89a66e30 dispatch_apply_f + 440
7   libBLAS.dylib                   0x00007fff906ca435 APL_dtrsm + 1963
8   libBLAS.dylib                   0x00007fff906702b6 cblas_dtrsm + 882
9   matrix_modn_dense_double.so     0x0000000108612615 void FFLAS::Protected::ftrsmRightLowerNoTransUnit<double>::delayed<FFPACK::Modular<double> >(FFPACK::Modular<double> const&, unsigned long, unsigned long, FFPACK::Modular<double>::Element*, unsigned long, FFPACK::Modular<double>::Element*, unsigned long, unsigned long, unsigned long) + 2853
10  matrix_modn_dense_double.so     0x0000000108611daa void FFLAS::Protected::ftrsmRightLowerNoTransUnit<double>::delayed<FFPACK::Modular<double> >(FFPACK::Modular<double> const&, unsigned long, unsigned long, FFPACK::Modular<double>::Element*, unsigned long, FFPACK::Modular<double>::Element*, unsigned long, unsigned long, unsigned long) + 698
11  matrix_modn_dense_double.so     0x0000000108612ccf void FFLAS::Protected::ftrsmRightLowerNoTransUnit<double>::operator()<FFPACK::Modular<double> >(FFPACK::Modular<double> const&, unsigned long, unsigned long, FFPACK::Modular<double>::Element*, unsigned long, FFPACK::Modular<double>::Element*, unsigned long) + 831
12  ???                             0x00007f99e481a028 0 + 140298940424232

Thread 1:
0   libsystem_kernel.dylib          0x00007fff8c6d26d6 __workq_kernreturn + 10
1   libsystem_c.dylib               0x00007fff86b24f4c _pthread_workq_return + 25
2   libsystem_c.dylib               0x00007fff86b24d13 _pthread_wqthread + 412
3   libsystem_c.dylib               0x00007fff86b0f1d1 start_wqthread + 13

Thread 2:
0   libsystem_kernel.dylib          0x00007fff8c6d26d6 __workq_kernreturn + 10
1   libsystem_c.dylib               0x00007fff86b24f4c _pthread_workq_return + 25
2   libsystem_c.dylib               0x00007fff86b24d13 _pthread_wqthread + 412
3   libsystem_c.dylib               0x00007fff86b0f1d1 start_wqthread + 13

Thread 0 crashed with X86 Thread State (64-bit):
  rax: 0x0000000000000000  rbx: 0x00007fff5ec8e418  rcx: 0x00007fff5ec8df28  rdx: 0x0000000000000000
  rdi: 0x000000000000b8f7  rsi: 0x0000000000000004  rbp: 0x00007fff5ec8df40  rsp: 0x00007fff5ec8df28
   r8: 0x00007fff5ec8e418   r9: 0x0000000000000000  r10: 0x000000000000000a  r11: 0x0000000000000202
  r12: 0x00007f99ea500de0  r13: 0x0000000000000003  r14: 0x00007fff5ec8e860  r15: 0x00007fff906ca447
  rip: 0x00007fff8c6d1d46  rfl: 0x0000000000000202  cr2: 0x00007fff74a29848
Logical CPU: 0

Is there something special I need to do after a fork? I looked up the fork decorator of Sage and it looks like it basically does the same.

The crash also happens with the fork decorator of Sage itself. Another test case:

def fork_test2():
    def test():
        # do some stuff
    from sage.parallel.decorate import fork
    test_ = fork(test, verbose=True)
    test_()

Even simpler test case:

def _fork_test_func():
    while True:
        m = matrix(QQ, 100, [randrange(-100,100) for i in range(100*100)])
        m.right_kernel()

def fork_test():
    import os
    pid = os.fork()
    if pid != 0:
        print "parent, child: %i" % pid
        os.waitpid(pid, 0)
    else:
        print "child"
        try:
            _fork_test_func()
        finally:
            os._exit(0)

Results in a slightly different crash:

python(48672) malloc: *** error for object 0x11185f000: pointer being freed already on death-row
*** set a breakpoint in malloc_error_break to debug

With backtrace:

Crashed Thread:  1  Dispatch queue: com.apple.root.default-priority

Exception Type:  EXC_CRASH (SIGABRT)
Exception Codes: 0x0000000000000000, 0x0000000000000000

Application Specific Information:
*** error for object 0x11185f000: pointer being freed already on death-row


Thread 0:: Dispatch queue: com.apple.main-thread
0   matrix2.so                      0x0000000107fa403f __pyx_pw_4sage_6matrix_7matrix2_6Matrix_71right_kernel_matrix + 27551
1   ???                             0x000000000000000d 0 + 13

Thread 1 Crashed:: Dispatch queue: com.apple.root.default-priority
0   libsystem_kernel.dylib          0x00007fff8c6d239a __semwait_signal_nocancel + 10
1   libsystem_c.dylib               0x00007fff86b17e1b nanosleep$NOCANCEL + 138
2   libsystem_c.dylib               0x00007fff86b7b9a8 usleep$NOCANCEL + 54
3   libsystem_c.dylib               0x00007fff86b67eca __abort + 203
4   libsystem_c.dylib               0x00007fff86b67dff abort + 192
5   libsystem_c.dylib               0x00007fff86b43905 szone_error + 580
6   libsystem_c.dylib               0x00007fff86b43f7d free_large + 229
7   libsystem_c.dylib               0x00007fff86b3b8f8 free + 199
8   libBLAS.dylib                   0x00007fff906b0431 __APL_dgemm_block_invoke_0 + 132
9   libdispatch.dylib               0x00007fff89a65f01 _dispatch_call_block_and_release + 15
10  libdispatch.dylib               0x00007fff89a620b6 _dispatch_client_callout + 8
11  libdispatch.dylib               0x00007fff89a631fa _dispatch_worker_thread2 + 304
12  libsystem_c.dylib               0x00007fff86b24d0b _pthread_wqthread + 404
13  libsystem_c.dylib               0x00007fff86b0f1d1 start_wqthread + 13

The same happens also for this:

def fork_test2():
    from sage.parallel.decorate import fork
    test_ = fork(_fork_test_func, verbose=True)
    test_()

I am using the MacOSX 64bit binaries of Sage 5.8.

SIGILL in forked process

I am playing around with fork. I have a very simple test case which is basically like this:

def fork_test():
    import os
    pid = os.fork()
    if pid != 0:
        print "parent, child: %i" % pid
        os.waitpid(pid, 0)
    else:
        print "child"
        try:
            # some dummy matrix calculation
        finally:
            os._exit(0)

(See _fork_test_func() below for some sample matrix calculations.)

And I'm getting:

------------------------------------------------------------------------
Unhandled SIGILL: An illegal instruction occurred in Sage.
This probably occurred because a *compiled* component of Sage has a bug
in it and is not properly wrapped with sig_on(), sig_off(). You might
want to run Sage under gdb with 'sage -gdb' to debug this.
Sage will now terminate.
------------------------------------------------------------------------

With this (incomplete) backtrace:

Crashed Thread:  0  Dispatch queue: com.apple.root.default-priority

Exception Type:  EXC_BAD_INSTRUCTION (SIGILL)
Exception Codes: 0x0000000000000001, 0x0000000000000000

Application Specific Information:
BUG IN LIBDISPATCH: flawed group/semaphore logic

Thread 0 Crashed:: Dispatch queue: com.apple.root.default-priority
0   libsystem_kernel.dylib          0x00007fff8c6d1d46 __kill + 10
1   libcsage.dylib                  0x0000000101717f33 sigdie + 124
2   libcsage.dylib                  0x0000000101717719 sage_signal_handler + 364
3   libsystem_c.dylib               0x00007fff86b1094a _sigtramp + 26
4   libdispatch.dylib               0x00007fff89a66c74 _dispatch_thread_semaphore_signal + 27
5   libdispatch.dylib               0x00007fff89a66f3e _dispatch_apply2 + 143
6   libdispatch.dylib               0x00007fff89a66e30 dispatch_apply_f + 440
7   libBLAS.dylib                   0x00007fff906ca435 APL_dtrsm + 1963
8   libBLAS.dylib                   0x00007fff906702b6 cblas_dtrsm + 882
9   matrix_modn_dense_double.so     0x0000000108612615 void FFLAS::Protected::ftrsmRightLowerNoTransUnit<double>::delayed<FFPACK::Modular<double> >(FFPACK::Modular<double> const&, unsigned long, unsigned long, FFPACK::Modular<double>::Element*, unsigned long, FFPACK::Modular<double>::Element*, unsigned long, unsigned long, unsigned long) + 2853
10  matrix_modn_dense_double.so     0x0000000108611daa void FFLAS::Protected::ftrsmRightLowerNoTransUnit<double>::delayed<FFPACK::Modular<double> >(FFPACK::Modular<double> const&, unsigned long, unsigned long, FFPACK::Modular<double>::Element*, unsigned long, FFPACK::Modular<double>::Element*, unsigned long, unsigned long, unsigned long) + 698
11  matrix_modn_dense_double.so     0x0000000108612ccf void FFLAS::Protected::ftrsmRightLowerNoTransUnit<double>::operator()<FFPACK::Modular<double> >(FFPACK::Modular<double> const&, unsigned long, unsigned long, FFPACK::Modular<double>::Element*, unsigned long, FFPACK::Modular<double>::Element*, unsigned long) + 831
12  ???                             0x00007f99e481a028 0 + 140298940424232

Thread 1:
0   libsystem_kernel.dylib          0x00007fff8c6d26d6 __workq_kernreturn + 10
1   libsystem_c.dylib               0x00007fff86b24f4c _pthread_workq_return + 25
2   libsystem_c.dylib               0x00007fff86b24d13 _pthread_wqthread + 412
3   libsystem_c.dylib               0x00007fff86b0f1d1 start_wqthread + 13

Thread 2:
0   libsystem_kernel.dylib          0x00007fff8c6d26d6 __workq_kernreturn + 10
1   libsystem_c.dylib               0x00007fff86b24f4c _pthread_workq_return + 25
2   libsystem_c.dylib               0x00007fff86b24d13 _pthread_wqthread + 412
3   libsystem_c.dylib               0x00007fff86b0f1d1 start_wqthread + 13

Thread 0 crashed with X86 Thread State (64-bit):
  rax: 0x0000000000000000  rbx: 0x00007fff5ec8e418  rcx: 0x00007fff5ec8df28  rdx: 0x0000000000000000
  rdi: 0x000000000000b8f7  rsi: 0x0000000000000004  rbp: 0x00007fff5ec8df40  rsp: 0x00007fff5ec8df28
   r8: 0x00007fff5ec8e418   r9: 0x0000000000000000  r10: 0x000000000000000a  r11: 0x0000000000000202
  r12: 0x00007f99ea500de0  r13: 0x0000000000000003  r14: 0x00007fff5ec8e860  r15: 0x00007fff906ca447
  rip: 0x00007fff8c6d1d46  rfl: 0x0000000000000202  cr2: 0x00007fff74a29848
Logical CPU: 0

Is there something special I need to do after a fork? I looked up the fork decorator of Sage and it looks like it basically does the same.

The crash also happens with the fork decorator of Sage itself. Another test case:

def fork_test2():
    def test():
        # do some stuff
    from sage.parallel.decorate import fork
    test_ = fork(test, verbose=True)
    test_()

Even simpler test case:

def _fork_test_func():
    while True:
        m = matrix(QQ, 100, [randrange(-100,100) for i in range(100*100)])
        m.right_kernel()

def fork_test():
    import os
    pid = os.fork()
    if pid != 0:
        print "parent, child: %i" % pid
        os.waitpid(pid, 0)
    else:
        print "child"
        try:
            _fork_test_func()
        finally:
            os._exit(0)

Results in a slightly different crash:

python(48672) malloc: *** error for object 0x11185f000: pointer being freed already on death-row
*** set a breakpoint in malloc_error_break to debug

With backtrace:

Crashed Thread:  1  Dispatch queue: com.apple.root.default-priority

Exception Type:  EXC_CRASH (SIGABRT)
Exception Codes: 0x0000000000000000, 0x0000000000000000

Application Specific Information:
*** error for object 0x11185f000: pointer being freed already on death-row


Thread 0:: Dispatch queue: com.apple.main-thread
0   matrix2.so                      0x0000000107fa403f __pyx_pw_4sage_6matrix_7matrix2_6Matrix_71right_kernel_matrix + 27551
1   ???                             0x000000000000000d 0 + 13

Thread 1 Crashed:: Dispatch queue: com.apple.root.default-priority
0   libsystem_kernel.dylib          0x00007fff8c6d239a __semwait_signal_nocancel + 10
1   libsystem_c.dylib               0x00007fff86b17e1b nanosleep$NOCANCEL + 138
2   libsystem_c.dylib               0x00007fff86b7b9a8 usleep$NOCANCEL + 54
3   libsystem_c.dylib               0x00007fff86b67eca __abort + 203
4   libsystem_c.dylib               0x00007fff86b67dff abort + 192
5   libsystem_c.dylib               0x00007fff86b43905 szone_error + 580
6   libsystem_c.dylib               0x00007fff86b43f7d free_large + 229
7   libsystem_c.dylib               0x00007fff86b3b8f8 free + 199
8   libBLAS.dylib                   0x00007fff906b0431 __APL_dgemm_block_invoke_0 + 132
9   libdispatch.dylib               0x00007fff89a65f01 _dispatch_call_block_and_release + 15
10  libdispatch.dylib               0x00007fff89a620b6 _dispatch_client_callout + 8
11  libdispatch.dylib               0x00007fff89a631fa _dispatch_worker_thread2 + 304
12  libsystem_c.dylib               0x00007fff86b24d0b _pthread_wqthread + 404
13  libsystem_c.dylib               0x00007fff86b0f1d1 start_wqthread + 13

The same happens also for this:

def fork_test2():
    from sage.parallel.decorate import fork
    test_ = fork(_fork_test_func, verbose=True)
    test_()

-- but only if you have done some matrix calculations before.


This test case also works in a fresh Sage session:

def _fork_test_func(iterator=None):
    if not iterator:
        import itertools
        iterator = itertools.count()
    for i in iterator:
        m = matrix(QQ, 100, [randrange(-100,100) for i in range(100*100)])
        m.right_kernel()

def fork_test():
    _fork_test_func(range(10))
    import os
    pid = os.fork()
    if pid != 0:
        print "parent, child: %i" % pid
        os.waitpid(pid, 0)
    else:
        print "child"
        try:
            _fork_test_func()
        finally:
            os._exit(0)

I am using the MacOSX 64bit binaries of Sage 5.8.