Python 信号处理在不同平台上的差异

在前文《为何 Windows 下无法用 Ctrl+C 终止 Python 进程》中,虽然解释了产生该现象的原因,但却没有解释为何同样的代码在 Linux 下就可以用 Ctrl+C 来中止。究其原因,是由于在操作系统层面,Linux 和 Windows 对 SIGINT 的信号处理方式不同所导致的。

Python 的底层实现原理

Python 将操作系统或 C 标准库提供的信号处理器称作 Low-level signal handler,Python 内建的 signal 模块在其基础上进行了封装。在 Python/pylifecycle.c 文件中找到 PyOS_setsig() 函数:

c/*
 * All of the code in this function must only use async-signal-safe functions,
 * listed at `man 7 signal` or
 * http://www.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_04.html.
 */
PyOS_sighandler_t
PyOS_setsig(int sig, PyOS_sighandler_t handler)
{
#ifdef HAVE_SIGACTION
    /* Some code in Modules/signalmodule.c depends on sigaction() being
     * used here if HAVE_SIGACTION is defined.  Fix that if this code
     * changes to invalidate that assumption.
     */
    struct sigaction context, ocontext;
    context.sa_handler = handler;
    sigemptyset(&context.sa_mask);
    /* Using SA_ONSTACK is friendlier to other C/C++/Golang-VM code that
     * extension module or embedding code may use where tiny thread stacks
     * are used.  https://bugs.python.org/issue43390 */
    context.sa_flags = SA_ONSTACK;
    if (sigaction(sig, &context, &ocontext) == -1)
        return SIG_ERR;
    return ocontext.sa_handler;
#else
    PyOS_sighandler_t oldhandler;
    oldhandler = signal(sig, handler);
#ifdef HAVE_SIGINTERRUPT
    siginterrupt(sig, 1);
#endif
    return oldhandler;
#endif
}

如果编译时定义了 HAVE_SIGACTION 宏,则调用 POSIX 标准的 sigaction() 函数注册信号处理器,否则使用 ANSI C 标准库的 signal() 函数。由于 Windows 下的 <signal.h> 没有提供 sigaction() 函数,所以在 Windows 平台上,Python 使用 signal() 函数来注册信号处理器;而 Linux 内核兼容 POSIX 标准,所以在 Linux 平台上,Python 使用 sigaction() 函数。关于 Python 信号处理的进一步封装代码可以在 Modules/signalmodule.c 文件内找到。

Windows 下的实现原理

先来看 Windows 下的 signal() 函数是如何实现的。如果安装了 Windows SDK,则可以在本地路径 C:\Program Files (x86)\Windows Kits\10\Source\10.0.22621.0\ucrt\misc\signal.cpp (此处以版本 10.0.22621 为例,不同版本安装路径不同)找到实现代码:

c++extern "C" __crt_signal_handler_t __cdecl signal(int signum, __crt_signal_handler_t sigact)
{
    // Check for signal actions that are supported on other platforms but not on
    // this one, and make sure the action is not SIG_DIE:
    if (is_unsupported_signal(signum, sigact))
        return signal_failed(signum);

    // First, handle the case where the signal does not correspond to an
    // exception in the host OS:
    if (signum == SIGINT         ||
        signum == SIGBREAK       ||
        signum == SIGABRT        ||
        signum == SIGABRT_COMPAT ||
        signum == SIGTERM)
    {
        bool set_console_ctrl_error = false;
        __crt_signal_handler_t old_action = nullptr;

        __acrt_lock(__acrt_signal_lock);
        __try
        {
            // If the signal is SIGINT or SIGBREAK make sure the handler is
            // installed to capture ^C and ^Break events:
            // C4127: conditional expression is constant
#pragma warning( suppress: 4127 )
            if (is_console_signal(signum) && !console_ctrl_handler_installed)
            {
                if (SetConsoleCtrlHandler(ctrlevent_capture, TRUE))
                {
                    console_ctrl_handler_installed = true;
                }
                else
                {
                    _doserrno = GetLastError();
                    set_console_ctrl_error = true;
                }
            }

            __crt_signal_handler_t* const action_pointer = get_global_action_nolock(signum);
            if (action_pointer != nullptr)
            {
                old_action = __crt_fast_decode_pointer(*action_pointer);
                if (sigact != SIG_GET)
                    *action_pointer = __crt_fast_encode_pointer(sigact);
            }
        }
        __finally
        {
            __acrt_unlock(__acrt_signal_lock);
        }

        if (set_console_ctrl_error)
            return signal_failed(signum);

        return old_action;
    }


    // If we reach here, signum is supposed to be one of the signals which
    // correspond to exceptions on the host OS.  If it's not one of these,
    // fail and return immediately:
    if (signum != SIGFPE && signum != SIGILL && signum != SIGSEGV)
        return signal_failed(signum);

    __acrt_ptd* const ptd = __acrt_getptd_noexit();
    if (ptd == nullptr)
        return signal_failed(signum);

    // Check that there is a per-thread instance of the exception-action table
    // for this thread.  If there isn't, create one:
    if (ptd->_pxcptacttab == __acrt_exception_action_table)
    {
        // Allocate space for an exception-action table:
        ptd->_pxcptacttab = static_cast<__crt_signal_action_t*>(_malloc_crt(__acrt_signal_action_table_size));
        if (ptd->_pxcptacttab == nullptr)
            return signal_failed(signum);

        // Initialize the table by copying the contents of __acrt_exception_action_table:
        memcpy(ptd->_pxcptacttab, __acrt_exception_action_table, __acrt_signal_action_table_size);
    }

    // Look up the proper entry in the exception-action table. Note that if
    // several exceptions are mapped to the same signal, this returns the
    // pointer to first such entry in the exception action table. It is assumed
    // that the other entries immediately follow this one.
    __crt_signal_action_t* const xcpt_action = siglookup(signum, ptd->_pxcptacttab);
    if (xcpt_action == nullptr)
        return signal_failed(signum);

    // SIGSEGV, SIGILL and SIGFPE all have more than one exception mapped to
    // them.  The code below depends on the exceptions corresponding to the same
    // signal being grouped together in the exception-action table.

    __crt_signal_handler_t const old_action = xcpt_action->_action;

    // If we are not just getting the currently installed action, loop through
    // all the entries corresponding to the given signal and update them as
    // appropriate:
    if (sigact != SIG_GET)
    {
        __crt_signal_action_t* const last = ptd->_pxcptacttab + __acrt_signal_action_table_count;

        // Iterate until we reach the end of the table or we reach the end of
        // the range of actions for this signal, whichever comes first:
        for (__crt_signal_action_t* p = xcpt_action; p != last && p->_signal_number == signum; ++p)
        {
            p->_action = sigact;
        }
    }

    return old_action;
}

通过代码得知,Windows 下 signal() 函数的 signum 参数仅允许以下信号:

由于 Windows 内核本身并不支持 signal,所以不同的信号处理方式也是不同的。其中,SIGINTSIGBREAKSIGABRTSIGABRT_COMPATSIGTERM 通过调用 Windows API SetConsoleCtrlHandler() 实现;而 SIGFPESIGILLSIGSEGV 的信号处理则由 CRT 自己维护。使用 SetConsoleCtrlHandler() 注册一个 HandlerRoutine 类型的回调函数后,当控制台收到信号时,系统会在进程中创建一个新线程来执行回调函数(未经证实的猜测:这一步操作可能是由 ConHost 或者 OpenConsole 来完成的)。默认的回调函数在接收到 SIGINT 信号时,会直接使用 ExitProcess() 来退出当前进程。相当于如下代码:

c#include <windows.h>
#include <stdio.h>

BOOL WINAPI CtrlHandler(DWORD fdwCtrlType)
{
    if (fdwCtrlType == CTRL_C_EVENT) {
        ExitProcess();
    }
    return TRUE;
}

int main(void)
{
    SetConsoleCtrlHandler(CtrlHandler, TRUE)
    while (1) {}
    return 0;
}

由于 CtrlHandler() 是在新线程中被调用的,所以即便程序的主线程处于阻塞状态,也可以实现立即退出进程。然而 Windows 下的 Python 重写了这一行为,它注册的 low-level signal hander 只是修改了 VM 的一个标志位,用于告知 VM 有待处理的信号。而在 Python 代码中注册的 signal handler,必须等到主线程返回到 VM 内才有机会被调用。如果此时主线程阻塞在 VM 之外的代码上,比如 Windows API 或者其他 Native 代码,那么主线程就不能对信号做出响应。这就为什么是在某些情况下,Windows 下的 Python 程序不响应 Ctrl+C 的原因。

time.sleep() 的特殊处理

可能你会奇怪,下面的代码明明在 Windows 下工作得很正常,会立即响应 SIGINT 信号:

pythonimport time

try:
    while True: time.sleep(100000)
except KeyboardInterrupt:
    print('keyboard interrupt received')

那是因为 Python 对 Windows 下的 time.sleep() 做了特殊处理,并非通过调用 Windows API Sleep() 来实现线程休眠功能。在 Modules/timemodule.c 查看 Windows 下的 time.sleep() 实现:

c#else  // MS_WINDOWS
    _PyTime_t timeout_100ns = _PyTime_As100Nanoseconds(timeout,
                                                       _PyTime_ROUND_CEILING);

    // Maintain Windows Sleep() semantics for time.sleep(0)
    if (timeout_100ns == 0) {
        Py_BEGIN_ALLOW_THREADS
        // A value of zero causes the thread to relinquish the remainder of its
        // time slice to any other thread that is ready to run. If there are no
        // other threads ready to run, the function returns immediately, and
        // the thread continues execution.
        Sleep(0);
        Py_END_ALLOW_THREADS
        return 0;
    }

    LARGE_INTEGER relative_timeout;
    // No need to check for integer overflow, both types are signed
    assert(sizeof(relative_timeout) == sizeof(timeout_100ns));
    // SetWaitableTimer(): a negative due time indicates relative time
    relative_timeout.QuadPart = -timeout_100ns;

    HANDLE timer = CreateWaitableTimerExW(NULL, NULL, timer_flags,
                                          TIMER_ALL_ACCESS);
    if (timer == NULL) {
        PyErr_SetFromWindowsErr(0);
        return -1;
    }

    if (!SetWaitableTimerEx(timer, &relative_timeout,
                            0, // no period; the timer is signaled once
                            NULL, NULL, // no completion routine
                            NULL,  // no wake context; do not resume from suspend
                            0)) // no tolerable delay for timer coalescing
    {
        PyErr_SetFromWindowsErr(0);
        goto error;
    }

    // Only the main thread can be interrupted by SIGINT.
    // Signal handlers are only executed in the main thread.
    if (_PyOS_IsMainThread()) {
        HANDLE sigint_event = _PyOS_SigintEvent();

        while (1) {
            // Check for pending SIGINT signal before resetting the event
            if (PyErr_CheckSignals()) {
                goto error;
            }
            ResetEvent(sigint_event);

            HANDLE events[] = {timer, sigint_event};
            DWORD rc;

            Py_BEGIN_ALLOW_THREADS
            rc = WaitForMultipleObjects(Py_ARRAY_LENGTH(events), events,
                                        // bWaitAll
                                        FALSE,
                                        // No wait timeout
                                        INFINITE);
            Py_END_ALLOW_THREADS

            if (rc == WAIT_FAILED) {
                PyErr_SetFromWindowsErr(0);
                goto error;
            }

            if (rc == WAIT_OBJECT_0) {
                // Timer signaled: we are done
                break;
            }

            assert(rc == (WAIT_OBJECT_0 + 1));
            // The sleep was interrupted by SIGINT: restart sleeping
        }
    }
    else {
        DWORD rc;

        Py_BEGIN_ALLOW_THREADS
        rc = WaitForSingleObject(timer, INFINITE);
        Py_END_ALLOW_THREADS

        if (rc == WAIT_FAILED) {
            PyErr_SetFromWindowsErr(0);
            goto error;
        }

        assert(rc == WAIT_OBJECT_0);
        // Timer signaled: we are done
    }

    CloseHandle(timer);
    return 0;

error:
    CloseHandle(timer);
    return -1;
#endif

Windows 下的 Python 使用 CreateWaitableTimerExW() 创建了一个可等待的计时器对象,通过 WaitForMultipleObjects() 实现线程等待休眠。而在 Modules/signalmodule.c 中,Python low-level signal_handler() 遇到 SIGINT 信号时,通过 SetEvent()WaitForMultipleObjects() 结束等待:

c#ifdef MS_WINDOWS
    if (sig_num == SIGINT) {
        signal_state_t *state = &signal_global_state;
        SetEvent(state->sigint_event);
    }
#endif

Linux 下的实现原理

Python 在 Linux 下使用 POSIX 标准的 sigaction() 函数来实现信号机制。由于 Linux 的 POSIX 兼容是内核提供的,所以实现原理和 Windows 完全不同。最本质的区别是,Linux 下的信号是真正的软中断,由内核触发信号处理,其原理有点类似 Windows 下的异步过程调用。区别是,Linux 的 signal 在收到信号时就进行异步调用;而 Windows 的 QueueUserAPC() 是将异步过程的函数加入当前线程的 APC 队列,等到线程执行到特定的 API 函数时再触发执行异步过程,这一点倒是和 Python 的 signal 机制有些类似。

Linux 下的信号由内核处理,当内核收到信号后,会将信号放入目标进程的信号队列中,并且向目标进程发生一个中断,使目标进程进入内核态。如果目标进程的主线程此时正被 IO 操作阻塞,相当于处于休眠状态,则休眠的主线程会被唤醒进行信号处理。接下来内核会将数据复制到进程的用户空间,并将 EIP 指令寄存器指向信号处理器的函数地址。然后返回到用户态中,执行相应的信号处理函数。因此,Linux 下的 Python 进程总是会响应 SIGINT 信号。