CVE-2022-20421 漏洞复现

CVE漏洞编号：CVE-2022-20421
google bulletin：https://source.android.com/docs/security/bulletin/2022-10-01?hl=zh-cn
相关patch commit：https://android.googlesource.com/kernel/common/+/19bb609b45fb

环境搭建

aosp编译报错：not found libncurses.so.5
debian 版本太新，需要下载较老的库并放入 /lib/x86-…/ 中

android-kernel 选择较老的分支：android-gs-raviole-5.10-android12-d1

repo init -u https://mirrors.tuna.tsinghua.edu.cn/git/AOSP/kernel/manifest -b android-gs-raviole-5.10-android12-d1
repo init -m default.xml # you may need to modify default.xml to tuna mirror
repo sync
# use `repo sync -c no-tags` to save network traffic

然后进行编译 Android 12 及以前的版本只能通过 build.sh 的方式进行安装，对于 pixel 6 设备而言，只需运行 ./build_slider.sh 即可。

使用 Android SDK emulator 模拟运行目标安卓内核：
好像 kernel 格式不太对，问了 GPT 说是使用的模拟器版本太老。。。
这里更换为对应sdk 31，使用 avd_manager 下载预编译好的 avd emulator 版本为31。
然后由于需要 avd 的内核版本（5.10.136）与自定义版本一致，所以重新选择对应分支

repo init -u https://android.googlesource.com/kernel/manifest -b common-android12-5.10-2022-08
repo sync
cd common
git checkout android12-5.10.136_r00
# ubuntu 20.04
BUILD_CONFIG=common/build.config.gki.x86_64 build/build.sh

安装完成后使用 emulator 启动，然后使用 adb shell 连接 shell 实现交互，先输入 adb root 命令可以获取 root shell。
emulator 启动命令如下：

#!/bin/bash

SYS_DIR=~/Android/Sdk/system-images/android-31/default/x86_64/
DATA_DIR=/.android/avd/Pixel_6.avd/

export ANDROID_PRODUCT_OUT=$SYS_DIR
export ANDROID_BUILD_TOP=$SYS_DIR

~/Android/Sdk/emulator/emulator \
  -avd Pixel_6 \
  -kernel $SYS_DIR/bzImage \
  -system $SYS_DIR/system.img \
  -data $DATA_DIR/userdata.img \
  -cache $SYS_DIR/cache.img \
  -sysdir $SYS_DIR \
  -vendor $SYS_DIR/vendor.img \
  -writable-system \
  -no-snapshot-save -no-snapshot-load -no-snapstorage -no-snapshot \
  -show-kernel \
  -no-window \
  -no-audio \
  -no-cache \
  -wipe-data \
  -accel on -netdelay none -no-sim \
  -verbose \
  -qemu -m 8192 -smp 8 -s -no-reboot

发现在该版本，已经被修复了，因此简单按 patch 去掉修复代码，原修复代码如下：

index 6fee6d5..0a4f0e7 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c

@@ -1519,6 +1519,18 @@ static int binder_inc_ref_for_node(struct binder_proc *proc,
 	}
 	ret = binder_inc_ref_olocked(ref, strong, target_list);
 	*rdata = ref->data;
+	if (ret && ref == new_ref) {
+		/*
+		 * Cleanup the failed reference here as the target
+		 * could now be dead and have already released its
+		 * references by now. Calling on the new reference
+		 * with strong=0 and a tmp_refs will not decrement
+		 * the node. The new_ref gets kfree'd below.
+		 */
+		binder_cleanup_ref_olocked(new_ref);
+		ref = NULL;
+	}
+
 	binder_proc_unlock(proc);
 	if (new_ref && ref != new_ref)
 		/*

再次编译得到存在漏洞的内核版本。

漏洞原理

基于 binder transactions 的 UAF

Binder Transactions

在 Binder 中，进程之间 Binder IPC Transcations 来交换信息，共享数据。
由 kernel driver 翻译交换信息的数据对象。
对于 binder handle 数据对象，可以通过 transactions 共享进程句柄给另一个进程：

transactions 向进程 b 发送 binder handle 对象

然后 kernel driver 侧会将 handle 对象翻译为在进程 b 创建新的对 c 的进程引用

Translating Handles

kernel driver 侧翻译过程如下：
首先在目标进程创建一个新的 binder ref

然后再增加其引用计数。

问题出在如果第二步失败了，没有清理掉该 ref。
那么如果进程 b 直接 exit，致使第二步失败，并且进程 b 会将所有 ref 清空，但此时执行 transaction 的进程 a 仍然会将做翻译。

进程 a 不会清空 ref，为进程 b 创建新的 ref，此时该 ref 已被释放，但是进程 c 中仍然保有该引用。同时如果后面 c exit，会在 binder_derred_release 函数中再次访问到 b 释放的 bind_proc 对象。

具体逻辑位于 binder_node_release 函数中，会通过 binder_inner_proc_lock 以及 ..unlock 函数来操作已释放的对象 ref->proc。

漏洞利用

设置 A，B，C 三个进程，A 存在对 C 的 binder_ref，同时 A 将该 binder_ref 共享给 B，此时 B 会在中间 exit 导致 binder_proc 被释放，同时由于 binder_ref 已经被创建，且缺少释放逻辑导致其仍然可被 C binder_proc 的 refs 指向到。存在 UAF。
当进程 C 退出时，会调用 binder_deferred_release->binder_node_release，释放其 binder_node 对应的所有 binder_ref（包括已释放的 B），此时会调用 binder_inner_proc_lock 函数对 binder_ref->proc 加自旋锁，即 binder_proc->inner_lock +1。

有关 binder 的基础内容可以参考：
binder internals
Attacking Android Binder: Analysis and Exploitation of CVE-2023-20938
binder_internals（本人简化版）

spinlock

spinlock （自旋锁），是一种令 cpu 不进入休眠状态持续空转的🔒。其一般用于竞争窗口较短，占用锁时间不长的情况。
binder_inner_proc_lock 函数即调用了该 spin_lock 为 binder_proc->inner_lock 加锁

#define binder_inner_proc_lock(proc) _binder_inner_proc_lock(proc, __LINE__)
static void
_binder_inner_proc_lock(struct binder_proc *proc, int line)
	__acquires(&proc->inner_lock)
{
	binder_debug(BINDER_DEBUG_SPINLOCKS,
		     "%s: line=%d\n", __func__, line);
	spin_lock(&proc->inner_lock);
}

其参数 spinlock_t 结构体定义如下：

#ifdef __LITTLE_ENDIAN
		struct {
			u8	locked;
			u8	pending;
		};
		struct {
			u16	locked_pending;
			u16	tail;
		};
#else
		struct {
			u16	tail;
			u16	locked_pending;
		};
		struct {
			u8	reserved[2];
			u8	pending;
			u8	locked;
		};
#endif
	};
} arch_spinlock_t;
typedef struct raw_spinlock {
	arch_spinlock_t raw_lock;
#ifdef CONFIG_DEBUG_SPINLOCK
	unsigned int magic, owner_cpu;
	void *owner;
#endif
#ifdef CONFIG_DEBUG_LOCK_ALLOC
	struct lockdep_map dep_map;
#endif
} raw_spinlock_t;
/* Non PREEMPT_RT kernels map spinlock to raw_spinlock */
typedef struct spinlock {
	union {
		struct raw_spinlock rlock;

#ifdef CONFIG_DEBUG_LOCK_ALLOC
# define LOCK_PADSIZE (offsetof(struct raw_spinlock, dep_map))
		struct {
			u8 __padding[LOCK_PADSIZE];
			struct lockdep_map dep_map;
		};
#endif
	};
} spinlock_t;

可以看到主要是由 locked , pending（locked_pending）, tail 组成。长度共4字节（union 联合体定义），其内存布局如下：

简化的spin_lock 以及 spin_unlock 伪代码如下：

我们在该漏洞中要利用的链路为起初 u32 lock != 0 。lock->tail != 0 || lock->pending !=0 然后通过 UAF 令 lock->locked == 0 其会将 lock->pending 以及 lock->locked 都设置为0 （spin_unlock）。
然后假设以下流程，我们在进程B 退出后重新申请回该对象，并且设置其 inner_lock 所处 offset 位置为非 0 值。然后在 CPU 0 上，我们令进程 C exit 退出释放该对象。此时其会在 spin_lock 处等待。然后我们再从另一个 CPU 上释放 obj对象，lock 并没有被改变，仍然存在。然后我们重新申请对象 obj2，而刚好该 obj2 的 LSB 为0，然后就会将第二字节（pending byte）也修改为 0。此时就会有一个指针地址错位。
需要注意安卓本身默认开启了 # **CONFIG_INIT_ON_ALLOC_DEFAULT_ON**=y 其会在对象初始化时清零对象内存。所以在第 5 步需要作 race，让 CPU 0 减慢速度从而在 obj2 成功申请后才会修改 pending byte。这里参考的是Racing against the clock – hitting a tiny kernel race window

然后通过找到合适的 obj（kmalloc-1k）。

obj1：其 inner_offset 处 LSB != 0；（tty_struct）
obj2：其 innder_offset 处 LSB == 0，且为内核指针；（fd_table） fd_table 为存储了一系列 file 结构体的指针，其内存布局如下，通过利用我们可以达到将 LSB == 0 的 pending byte 也清零。我们需要构造内存布局如下，如果成功篡改，即可实现将 filp 指针指向喷射的对象（tty_struct）地址此时我们就可以通过 close(vuln_fd) 来释放目标对象（tty_struct），到这里就实现了 tty 的UAF，然后使用 pipe_buffer 来实现内核的任意地址读写。

漏洞复现

需要 ANDROID_NDK_HOME 环境变量，在 android studio 中下载。
下载 badspin 源码，由于该源码默认编译架构为 aarch64，而我们模拟器上默认的只有 X86_64 架构，因此需要修改编译配置。

注意 git repo 里的 libsepol 是编译好的，最好先 make clean 一下。
可以添加 VERBOSE=1 环境变量来获取更多调试信息

有一段初筛检查，因为我们是 patch 的，所以版本号肯定是修复后的，需要把这一部分代码注释掉：

然后需要在 ./src/dev_config.h 中添加我们设备的信息：

似乎卡在了触发 UAF 这一步，不过观察模拟的日志输出可以发现触发了 Kernel BUG，并且内核运行卡死了，应该是有效果的。
下面就是进行漏洞调试以及结合漏洞原理进行分析了。

🌿了，这个漏洞后续提权是针对的 arm64 架构。但是现在 Android Studio 的 emulator 默认不支持与宿主机不同架构的 avd 模拟了。
按照该 link 可以更换旧版模拟器，但是由于 Android 12 对应 API 为 31，需要低于 28 才能跨架构模拟。所以这条路基本凉凉了。感觉想继续下去要么真机，要么在 x86 上搞了。

~~继续尝试在 x86_64 上进行复现。。。~~ spin_lock 在不同架构上实现不同，还是得在 aarch64 上实现。。。
~~其实理论上应该除了后续提权操作时绕过 addr_limit 以及 UAO 防护时可能会不同外，其他应该都问题不大。~~

exp 详解

对 kol 大佬的 exp 代码的理解：

1. 获取 inner_lock 的偏移。

会对各种变量，锁变量进行初始化。
然后会创建三个子进程 a, b, c。调用 do_client 函数。三个进程分别调用 do_client{a,b,c} 函数
a 进程会初始化一个 futex 锁变量 lc_watch（light_cond_init_shared）。用于监控来自 c 进程的 death notification。然后注册 b 进程的 death notification。调用 binder_enter_looper 函数进入 looper 开始接受进程 b/c 的 binder 请求；
a 进程会等待来自进程 c 的 binder_node 的 strong handle；
c 进程会进入 looper 接受请求。同时创建一个 sending thread，其会向 a 发送包含 cookie=0xcccccccc 的binder_node 的 transaction，然后等待 lc_c_thread_exit_post 锁变量。
a 进程接受到 c 发送的 transaction，发送 binder_free_transaction_buffer 来释放。a 进程 broadcast lc_c_thread_exit_post 信号，然后等待信号 lc_c_thread_exit_pend。
c 进程 fork 的线程 exit，然后 broadcast lc_c_thread_exit_pend
a 进程会注册进程 c 的 death notification（按 gpt 说法是注册也会增加其 node 的引用计数，从而会影响 node 的释放）。
a 进程创建 monitor 线程，绑定在 cpu 3。等待 lc_watch 信号；
a 进程向进程 b 发送从进程c获取到的 binder_node 的ref。注意这里发送的 transaction 包裹的 ref 是 exp 利用所要用到的 fd_list 内存（应该是文件描述符 int 数组），发送前就会 broadcast lc_watch 信号。
a 进程创建的 monitor 线程接受到 lc_watch 信号开始等待来自 b 的 death notification （0x5858585858585858）。
b 进程则创建了要喷射的 ptmx 对象（16个）大小内存，并且向内存中每4字节 LSB 写入 0x41。然后由于在 detect_mode 下，需要创建 ptmx （echo+blocking模式），并创建了 blocker_thread 线程，该线程会绑定 cpu 0 ，并设置 write 阻塞。然后等待 lc_spray_tty_post 信号。
b 进程继续向下执行，enter_looper 接受其他进程 transaction。绑定 cpu 3。然后 while 循环等待 binder.vmstart 变为 AB_MAGIC，该 AB_MAGIC 会在进程 a 发送 weak_refs 后被写入 transaction 数据中。也就是说 b 发现 a 进程开始发送就要启动destroy操作,这样才能大概率触发漏洞。
b 进程调用 binder_client_destroy 函数 unmap 掉 binder 内存。内核会释放其 binder_destroy 进程。然后 b 进程绑定 cpu 0，等待 lc_spray_tty_post 信号。
a 进程 monitor 线程接受到 b 的 death notification。
a 进程发送完 weak_refs 后读取 transaction 是否存在 BR_FAILED_REPLY ，应该是指 b death 后返回的。
a 进程释放 strong_handle，然后 broadcast lc_wakeup_c 信号。等待 c 的 death notification；
c 进程继续运行，broadcast lc_spray_tty_post 信号，等待 lc_spray_tty_pend；
b 进程开始运行，目前为 detect mode，因此会先运行 blocker_thread。向 /dev/ptmx 中写入前面构造 0x41 的数据，然后直接 broadcast lc_spray_tty_pend 信号，进入 sleep(0.5)，等待 c 进程 uaf；
c 进程继续，绑定 cpu 4，开始尝试触发 UAF，执行 destroy 操作；
b 进程查看 c 进程是否已经死亡；死亡则返回-3；否则尝试读取 ptmx 查看是否有能够看到 LSB != 0 的数据 offset。
a 进程检测到进程 c 的 death notification，检测到后在 ctx->sync_var_c_died 标记 c 已死亡，此时 C 已经退出 spinlock 自旋锁。
a b c 进程退出

2. 提权

首先获取到上述步骤获取到的offset，实验内核 inner_lock 偏移为 576（需要 >= 520）；同时检查是否 cpu 核心数量是否超过 8。
创建 graveyard 进程用来接收其他进程发送的 fd，从而可以维持其引用计数，防止被提前释放，为后面的利用做准备；
创建 pipe process ，用于等待后续进行管道的喷射；
调用 do_exploit 函数，正式开始提权利用；
创建 fd_master_process ，然后会等待 lc_finish_shaping 信号。
fd_master 进程会创建 dup process spawner 子进程，子进程会等待命令 SPAWNER_CMD_NEW_DUP_PROC 以及 SPAWNER_CMD_NEW_SHAPER_PROC
然后 fd_master 打开 NR_PTMXS=4000 个 /dev/ptmx 文件；创建 128 * NR_OBJECTS_FILP * sizeof(int) 大小的文件描述符列表；创建 NR_PIPE_PROCS * NR_PIPES * 2*sizeof(int) 大小的pipes 空间；
fd_master 向 spawner 子进程发送 SPAWNER_CMD_NEW_DUP_PROC 命令；子进程收到后会创建 dup process 绑定到 cpu 0。dup process 会 dup 大量fd。
fd_master 向 pipe process 发送用于后续与每一个 pipe_proc 通信的 sockfd。
fd_master 进行 shaping memory 工作，NR_SHAPERS=5 向 spawner 发送 SPAWNER_CMD_NEW_SHAPER_PROC 命令。spawner 创建子进程 shaper_process 绑定 cpu 4。其单个进程会打开 NR_FDS=30000 个 /dev/hwbinder 或者 libc.so 文件（跟据进程 idx == NR_SHAPERS-1，即最后一个 shaper proc），然后其会给 ctx->sync_var_shapers +1，然后等待信号 lc_shaper_done。
fd_master 向 dup process 发送临时 fd，使 dup process 进一步向下执行。完成后 broadcast lc_finish_shaping
fd_master 等待所有 dup process 完成，而 dup process 会进一步等待信号 lc_start_dup；
do_exploit 主函数继续向下执行，新建 timer_master_process 进程，本体继续向下执行 vuln 函数，vuln 函数与上面第一部分流程类似。
timer_master_process 进程绑定在 cpu 0，然后创建 timer_thread 线程，并等待 c 进入 spinlock ，即第一部分中 lc_spray_tty_pend 信号，只不过这里该信号由 timer_thread 线程控制。
timer_thread 线程会绑定 cpu 4，使用 timerfd_create 创建定时器对象，使用 epoll1_create 创建 epoll 实例，监听创建的定时器对象，然后等待 lc_timer_proc 信号；
vuln 函数 b 进程执行到最后会向ptmx中写入数据，然后 free 掉 ptmx write buf 并将其全清零，然后 broadcast lc_timer_proc 信号，返回到 do_exploit 函数；
do_exploit 函数等待 fd_master 继续执行完成返回利用结果以及被篡改的 tty_struct 以及 pipe_buffer；
timer_thread 线程设置了定时任务，然后 broadcast lc_spray_tty_pend 信号触发进程 c 退出。然后 timer_master 进程也会向下执行，其会先 usleep(USEC_SKEW_WAIT_FOR_USE) 然后 broadcast lc_start_dup 信号，等待 timer_threads 完成，该操作即 tiny race，放慢了 c 的执行速度，从而先让 dup process 申请并初始化完成后才去修改 pending byte。
此时 dup process 开始进行 dup 分配 fd_list 对象，完成后 broadcast lc_dup_process_close_post 信号，然后会检查其 fd 是否存在被篡改且篡改成功的，成功会返回 dup_cmd_prepare_tty，然后其会接受来自 fd_master dup_cmd_done 来关闭被篡改的 fd，完成所有fd会返回 dup_cmd_done。
fd_master 继续执行，接收来自 dup process 的响应，如果为 dup_cmd_prepare_tty 则证明成功，向 dup process 发送 dup_cmd_done 触发 tty_struct 的释放，然后开始喷射 pipe_buffer。
pipe process 接收到后执行 pipe 系统调用创建 pipe_buffer ，并将 pipe_fd 返回给 fd_master；
fd_master 接收到 pipe_fd 后 cleanup 前面申请的各种对象；
然后开始遍历 ptmx 以及 pipe 查找 uaf 的 tty_struct 以及 pipe_buffer 对象。其余的对象就直接关闭释放，成功后返回主函数 ptmx 以及 pipe 的 fd。
do_exploit 函数

调试过程

原仓库本身是为Pixel 6，Android 12/13， Samsung Galaxy S22/S21 Ultra 型号设计，直接跑并不能跑通，支持型号如下：

$ make list
0: Samsung Galaxy S22, Android 12 (6/2022), kernel 5.10.81
1: Samsung Galaxy S21 Ultra, Android 12 (3/2022), kernel 5.4.129
2: Google Pixel 6, Android 12 (5/2022), kernel 5.10.66
3: Google Pixel 6, Android 13 (9/2022), kernel 5.10.107

leak_inner_lock_offset 返回 -2

直接在 c 释放的地方下断点 binder_node_release (同时最好启动时先关闭 kaslr 方便调试)。

1 2	echo 0 > /proc/sys/kernel/kptr_restrict cat /proc/kallsyms \| grep binder_node_release

说明 /dev/ptmx 没有命中
调试发现确实实现了 tty_struct 申请回已释放的 binder_proc，而且 c 进程卡在 spin_lock 函数。
查看 ida 发现与动态调试类似，其只判断 lock 字节，并没有 pending 等逻辑

查看 android kernel 源码，其实问题主要出在 virt_spin_lock 函数

void __lockfunc queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
{
	struct mcs_spinlock *prev, *next, *node;
	u32 old, tail;
	int idx;

	BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS));

	if (pv_enabled())
		goto pv_queue;

	if (virt_spin_lock(lock))
		return;

	/*
	 * Wait for in-progress pending->locked hand-overs with a bounded
	 * number of spins so that we guarantee forward progress.
	 *
	 * 0,1,0 -> 0,0,1
	 */
	if (val == _Q_PENDING_VAL) {
		int cnt = _Q_PENDING_LOOPS;
		val = atomic_cond_read_relaxed(&lock->val,
					       (VAL != _Q_PENDING_VAL) || !cnt--);
	}

	/*
	 * If we observe any contention; queue.
	 */
	if (val & ~_Q_LOCKED_MASK)
		goto queue;

	...

virt_spin_lock 函数定义如下，从注释可以发现如果在 hypervisors 上会将 spin_lock 退化为 TaS，其对应 config 参数为 CONFIG_PARAVIRT，默认为 y。

#ifdef CONFIG_PARAVIRT
/*
 * virt_spin_lock_key - disables by default the virt_spin_lock() hijack.
 *
 * Native (and PV wanting native due to vCPU pinning) should keep this key
 * disabled. Native does not touch the key.
 *
 * When in a guest then native_pv_lock_init() enables the key first and
 * KVM/XEN might conditionally disable it later in the boot process again.
 */
DECLARE_STATIC_KEY_FALSE(virt_spin_lock_key);
/*
 * Shortcut for the queued_spin_lock_slowpath() function that allows
 * virt to hijack it.
 *
 * Returns:
 *   true - lock has been negotiated, all done;
 *   false - queued_spin_lock_slowpath() will do its thing.
 */
#define virt_spin_lock virt_spin_lock
static inline bool virt_spin_lock(struct qspinlock *lock)
{
	int val;

	if (!static_branch_likely(&virt_spin_lock_key))
		return false;

	/*
	 * On hypervisors without PARAVIRT_SPINLOCKS support we fall
	 * back to a Test-and-Set spinlock, because fair locks have
	 * horrible lock 'holder' preemption issues.
	 */

 __retry:
	val = atomic_read(&lock->val);

	if (val || !atomic_try_cmpxchg(&lock->val, &val, _Q_LOCKED_VAL)) {
		cpu_relax();
		goto __retry;
	}

	return true;
}

该设置仅存在于 x86 架构下，x86_64 下 spin_lock 只会循环等待 LSB 即 locked 字节变为 0，而并不会进行 pending 等字节的操作。因此还是只能在 arm64 架构上进行复现，当然可能可以修改CONFIG_PARAVIRT 来进行复现，笔者没有进行进一步尝试。
先编译了一版 arm64 架构的 kernel 用 ida 查看，可以发现逻辑明显不同：

在 mac 上继续尝试～
launcher 命令与上面一样，不过需要把 -data 参数去掉，不知道为什么（
然后还需要再像上面一样修改 dev_config.h 即可运行。

VFS: file-max limit 170065 reached

执行到 SYSCHK(epoll_create1(0)) 时 kernel 显示 file-max limit 。
内存不够，包括后面可能在 epoll_add 时候寄掉也可能是这个问题，设置qemu 内存 8GB 即可

最终成功利用，截图如下，右下角获取到目标 root shell：