"
Author: tjytimi [email protected]
Date: 2022/05/22
Revisor: Falcon [email protected]
Project: Anatomy of the RISC-V Linux Kernel"
Process creation process code analysis
foreword
This paper analyzes the process creation process code, and analyzes the Linux kernel's support for process implementation under RICS-V through the process creation process.
This article analyzes the code execution order, only describes the implementation of key functions, and the kernel version is Linux 5.17.
kernel_clone
Linux new process creation is achieved through fork()
system calls. After the user process enters the kernel mode through the fork()
system call, it enters its system call prototype. code show as below:
// kernel/fork.c : 2620
SYSCALL_DEFINE0(fork)
{
struct kernel_clone_args args = {
.exit_signal = SIGCHLD,
};
return kernel_clone(&args);
}
This function is implemented by simply setting the args
parameters and calling the kernel_clone()
function.
In fact, creating a process includes a set of system calls, including clone()
, vfork()
etc., and its final implementation is achieved by calling the kernel_clone()
function , the difference is only the setting arg
parameters are different.
Usually we use the Pthread thread library commonly used in user programs, and finally kernel_clone()
complete .
Friends who are familiar with the previous kernel version can find that it kernel_clone()
is the _do_fork()
function in the previous version of the kernel. The simplified code is as follows:
// kernel/fork.c : 2524
pid_t kernel_clone(struct kernel_clone_args *args)
{
u64 clone_flags = args->flags;
struct pid *pid;
struct task_struct *p;
p = copy_process(NULL, trace, NUMA_NO_NODE, args);
pid = get_task_pid(p, PIDTYPE_PID);
nr = pid_vnr(pid);
wake_up_new_task(p);
return nr;
}
kernel_clone()
Do the following in sequence:
complete copy of process descriptor
Call to copy_process()
complete the generation of the child process and generate the child process process descriptor ( task_struct
).
This function determines whether to copy or reuse the parent process data according to the flag
flag . This is the basis on which processes are created and can be run.
This function is described in detail in copy_process()
the section .
Get the process PID of the new process
Because the fork()
system call needs to return namespace
the PID of the child process in the current process namespace ( ) in the parent process as the return value, which is used for management work such as the recovery of the child process by the parent process in the user space. So after copy_process()
completion , kernel_clone()
the process ID number of the current namespace will be obtained from the PID structure of the child process.
wake up child process
Call wake_up_new_task()
Insert the scheduling entity in the process descriptor into the run queue to complete the process wake-up function. It should be noted that the wake-up at this time does not mean that the child process starts to execute. The actual execution of the child process is selected by the scheduler at the appropriate time according to the scheduling policy to execute the process.
This will be described in detail in wake_up_new_task
the section.
copy_process()
As the name suggests, the copy_process()
function is responsible for duplicating the parent process' data structures. The function is defined kernel/fork.c
in . The main flow of the function is as follows, which will be analyzed in turn.
copy_process()
- dup_task_struct()
- sched_fork()
- copy_xxx(copy_mm,copy_fs,copy_files,copy_thread)
- 初始化进程 PID 实体及进程关系
dup_task_struct()
dup_task_struct()
The function makes a preliminary copy of the parent process descriptor to generate a new child process descriptor. The specific process of the function is as follows:
-
First, a process descriptor (
struct task_struct *tsk
) and a kernel stack (unsigned long *stack
) are allocated through the SLAB system. -
Copy the process descriptor of the parent process to the process descriptor of the newly created child process through the
arch_dup_task_struct()
function .arch_dup_task_struct()
is a processor-related function, the code is as follows. Whichfstate_save()
is responsible for saving the RICS-V processor floating point unit register in the element contained in thethread
field ( )struct thread_struct
of the parent process process descriptor. Field definitions are processor-related and are used to save processor-related context information for use in process switching. See section for definitions . Copy the process descriptor of the parent process directly to the child process, and the subsequent process will modify the elements that need to be modified.fstate
thread
struct thread_struct
copy_thread
*dst = *src
// arch/riscv/kernel/process.c : 115
int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
{
fstate_save(src, task_pt_regs(src));
*dst = *src;
return 0;
}
-
Point the kernel stack pointer of the child process descriptor to the kernel stack requested in step 1.
-
Call the
setup_thread_stack()
function , which is used to initialize thethread_info
data. Under the RISC-V architecture, thethread_info
stack space is not shared, but is explicitly defined in thetask_stuct
structure , so step 2 has completedthread_info
the copy, and thesetup_thread_stack()
function does nothing and returns directly. In the X86 and MIPS architectures,thread_info
the kernel stack space is reused, and this function willthread_info
copy child process.
The processor will frequently use the current process thread_info
, so the kernel hopes to achieve fast access through registers. The RICS-V architecture has a special tp
( thread point
) register, which stores thread_info
the address of . When CONFIG_THREAD_INFO_IN_TASK
this configuration is opened, thread_info
it is task_struct
the first member of , which is also equivalent to task_struct
preparing a register tp
for .
struct task_struct {
#ifdef CONFIG_THREAD_INFO_IN_TASK
/*
* For reasons of header soup (see current_thread_info()), this
* must be the first element of task_struct.
*/
struct thread_info thread_info;
#endif
...
Similar MIPS architectures use gp
registers thread_info
to store addresses. X86 does not specially prepare registers, but because it is thread_info
put into kernel stack for multiplexing, it can also be accessed quickly through stack registers thread_info
. From here, it can be seen that learning the kernel through the RISC-V architecture has a lot less processor-related difficulties that need to be carefully thought about compared to the X86.
-
Initialize some elements in the subprocess structure. For example, the flag that the process needs to be rescheduled in the call clear_tsk_need_resched()
will bethread_info
reset (because the new process does not need to be rescheduled, the new process may copy this flag when copying the descriptor).
sched_fork
sched_fork()
The function is responsible for initializing scheduling-related fields, such as initializing the priority of the process, initializing the virtual runtime, and initializing the scheduling class. This part of the code is very simple and straightforward, and no detailed analysis is required.
copy_xxx
This includes a series copy_xxx
of . By their names, it can be known that the function of the function is to copy or share the data corresponding to the specific structure in the process descriptor. Whether the copy is shared or not depends on the setting copy_process()
of the flag
flag .
Here the main analysis copy_mm()
and copy_thread()
two parts.
copy_mm
copy_mm()
It mainly performs the process of copying the memory descriptor of the process. The main flow of the function is as follows. For some content related to memory management, please refer to other related analysis articles in this project.
copy_mm()
- dup_mm
- mm_init
- mm_alloc_pgd
- pgd_malloc
- init_new_context
- dup_mmap
The function first calls the dup_mm()
function copy the memory descriptor, and the main work of this function is done by andmm_init()
.init_new_context()
mm_init()
Carry out the initialization of the process memory descriptor, and finally call the processor-related pgd_malloc()
function to apply for the global page directory pgd
, and copy the entire content of the kernel page table entry from the No. 0 process to this process.
// arch/riscv/include/asm/pgalloc.h : 80
static inline pgd_t *pgd_alloc(struct mm_struct *mm)
{
pgd_t *pgd;
pgd = (pgd_t *)__get_free_page(GFP_KERNEL);
if (likely(pgd != NULL)) {
memset(pgd, 0, USER_PTRS_PER_PGD * sizeof(pgd_t));
/* Copy kernel mappings */
memcpy(pgd + USER_PTRS_PER_PGD,
init_mm.pgd + USER_PTRS_PER_PGD,
(PTRS_PER_PGD - USER_PTRS_PER_PGD) * sizeof(pgd_t));
}
return pgd;
}
init_new_context()
Set the context ID in the memory descriptor to 0 to ensure that the ASID is invalid when the process is switched, and the ASID version number and hardware number will be directly applied for the memory descriptor.
// arch/riscv/include/asm/mmu_context.h : 26
#define init_new_context init_new_context
static inline int init_new_context(struct task_struct *tsk,
struct mm_struct *mm)
{
#ifdef CONFIG_MMU
atomic_long_set(&mm->context.id, 0);
#endif
return 0;
}
During process scheduling, it will call to switch_mm()
switch to the newly created process page global directory pgd
, and this function will call the csr_write
macro to write the directory address and ASID information to the CSR_SATP
register. Please refer to the context_switch()分析
relevant switch_mm()
section of this project.
Then the copy_mm()
function will call dup_mmap()
the function to copy the process address space of the parent process to the child process. When copying, it will be copied level by level. When the page table entry is copied at the last level, the pte_wrprotect
function to write protection for the page. This is the realization of basic写时复制(COW)
work.
copy_thread
copy_thread
It is a key step related to the CPU architecture in process creation. It is used to create and initialize the thread context descriptor thread
, which is used to store the CPU-related state, and its corresponding structure definition is processor-related. The thread_struct
definitions as follows:
// arch/riscv/include/asm : 31
struct thread_struct {
/* Callee-saved registers */
unsigned long ra;
unsigned long sp; /* Kernel mode stack */
unsigned long s[12]; /* s[0]: frame pointer */
struct __riscv_d_ext_state fstate;
unsigned long bad_cause;
};
This structure is used to store the value of the callee-saved register ( callee saved register
). According to the RISC-V manual register description:
-
ra
For the return address register, storeret
the address where the execution starts after the return instruction. -
sp
For the kernel stack register.s0
-s11
for the saving register (saved register
), -
fstate
Registers associated with floating-point operations ( described indup_task_struct()
section ).fstate
The copy_thread
functions as follows:
// arch/ricsv/kernel/process.c : 122
int copy_thread(unsigned long clone_flags, unsigned long usp, unsigned long arg,
struct task_struct *p, unsigned long tls)
{
struct pt_regs *childregs = task_pt_regs(p);
/* p->thread holds context to be restored by __switch_to() */
if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
/* Kernel thread */
memset(childregs, 0, sizeof(struct pt_regs));
childregs->gp = gp_in_global;
/* Supervisor/Machine, irqs on: */
childregs->status = SR_PP | SR_PIE;
p->thread.ra = (unsigned long)ret_from_kernel_thread;
p->thread.s[0] = usp; /* fn */
p->thread.s[1] = arg;
} else {
*childregs = *(current_pt_regs());
if (usp) /* User fork */
childregs->sp = usp;
if (clone_flags & CLONE_SETTLS)
childregs->tp = tls;
childregs->a0 = 0; /* Return value of fork() */
p->thread.ra = (unsigned long)ret_from_fork;
}
p->thread.sp = (unsigned long)childregs; /* kernel sp */
return 0;
}
copy_thread
The function first initializes the kernel stack, and uses task_pt_regs(p)
to force the kernel stack space into a register context pt_regs
structure, which is used to save the value of the register during exceptions and system calls. RISC-V This structure is defined as follows:
struct pt_regs {
unsigned long epc;
unsigned long ra;
unsigned long sp;
unsigned long gp;
unsigned long tp;
...
/* Supervisor/Machine CSRs */
unsigned long status;
unsigned long badaddr;
unsigned long cause;
/* a0 value before the syscall */
unsigned long orig_a0;
}
Then, different branches are decided according to whether the new process is a kernel thread or a user process, which will be described below.
-
kernel thread
If the new process is a kernel thread, set in the thread thread
context ra
to the ret_from_kernel_thread
function address, s[0]
set the thread context to fn
the address ( fn is the execution function passed in when creating a new kernel thread), and s[1]
set it to arg ( arg is passed in the new kernel thread to be executed parameter of the function number fn).
This means that when the child process starts to run, ret_from_kernel_thread
the , this function is the assembly related to the RISC-V architecture, which specifies the s0
function corresponding to the calling register, and uses the s1
register as the parameter of this function.
-
user process
-
Call
current_pt_regs()
to get the context of the kernel stack register of the current process (parent process) and copy it to the context in the stack of the child processchildregs
. -
Assign the stack register of the register context in the child process kernel stack
sp
(that is, the user stack after returning to user mode) to usp. -
If is
clone_flags
setCLONE_SETTLS
, assign thetp
value totls
. You can refer to the system call CLONE_SETTLS related content -
a0
The register of RISC -V is the return value register after the system call returns to the user state, so it ischildregs->a0
set to 0. Whenfork()
the system call sub-process returns to the user state,a0
the value of is restored from the kernel stack to thea0
register , and the return value of the sub-process is 0 . -
thread.ra
Set the kernel thread context of the child process toret_from_fork
the address where the assembly is located, andthread.sp
assignchildregs
the address just set. After the user process completes the above process, the process descriptor is shown in the following figure.
When the process switches to the child process, the kernel will restore the values of all registers from the kernel thread context: the kernel stack will become the above set thread.sp
, the ra
register will become thread.ra
( ret_from_fork
the address of the assembly), which will make the child process first after being scheduled execute ret_from_fork
.
ret_from_fork
After completion, the register context stored in the kernel stack will be restored to the RISC-V registers, including writing usp to the stack register, completing the switch from the kernel stack to the user stack, and finally calling mret
or sret
returning to user mode. 创建进程后子进程的执行
A section will describe this in detail.
The above involves the kernel stack, the user stack, the register context in the kernel stack, and the thread context in the process descriptor. Various concepts are confusing and should be carefully distinguished.
wake_up_new_task
After the copy of the subprocess descriptor is completed, kernel_clone()
call to wake_up_new_task()
wake up the subprocess, add it to the run queue, and wait for the scheduling of the scheduling system. Its main work is done by the activate_task()
function . The main process is as follows:
activate_task
- enqueue_task
- p->sched_class->enqueue_task(rq, p, flags)
- p->on_rq = TASK_ON_RQ_QUEUED
activate_task()
First call enqueue_task()
, the insert function corresponding to the scheduler class sched_class
corresponding to , and this class sched_fork()
is set in the function. After inserting, set the process's on_rq
ID to TASK_ON_RQ_QUEUED
, indicating that the process has entered the run queue.
Execution of child process after process creation
After the child process is established, it is only added to the running queue and does not actually run. Only when the scheduler selects the child process, the child process will be put into operation. For the content of the scheduler, please refer to other materials.
When the process is scheduled, context_switch()
the switch_to
assembly function in the process switching function restores the selected next process to the state before it was switched out. You can refer to the relevant content RISC-V Linux 上下文切换分析
in .
// arch/riscv/kernel/entry.S : 512
ENTRY(__switch_to)
/* Save context into prev->thread */
li a4, TASK_THREAD_RA
add a3, a0, a4
add a4, a1, a4
REG_S ra, TASK_THREAD_RA_RA(a3)
REG_S sp, TASK_THREAD_SP_RA(a3)
REG_S s0, TASK_THREAD_S0_RA(a3)
....
REG_S s11, TASK_THREAD_S11_RA(a3)
/* Restore context from next->thread */
REG_L ra, TASK_THREAD_RA_RA(a4)
REG_L sp, TASK_THREAD_SP_RA(a4)
REG_L s0, TASK_THREAD_S0_RA(a4)
...
REG_L s11, TASK_THREAD_S11_RA(a4)
/* The offset of thread_info in task_struct is zero. */
move tp, a1
ret
ENDPROC(__switch_to)
According to the __switch_to
code , if the next process is a process that has already been executed, when it was switched out last time, the value of the registerthread.ra
will be recorded in . After this switch back, the register will be restored to the return address stored by the next process, so that the last The instruction will make the process return to the function , and then execute the last function of the function to perform process scheduling finishing work.ra
ra
switch_to
ret
context_switch()
context_switch()
finish_task_switch()
However, the newly created child process has not been switched out before, so for the new process, the above copy_thread()
function will thread.ra
assign the value to ret_from_fork
, so that the scheduler thinks that the child process is switched out from ret_from_fork
at .
So the new process that is scheduled will not be executed finish_task_switch()
, but will be executed first ret_from_fork
.
ret_from_fork
The main process is as follows, the assembly first calls schedule_tail
to perform the finishing work of process scheduling, which actually performs finish_task_switch()
functions similar to finishing code.
ret_from_fork
- schedule_tail
- ret_from_exception
- restore_all
The calling ret_from_exception
function executes the function returned from the exception or system call.
ret_from_exception
The main flow of the function is restore_all
that its assembly code is as follows. The restoration of the state before the system call is completed by loading the register context stored in the kernel stack at the time of the system call into the register of RISC-V.
// arch/riscv/kernel/entry.S : 268
restore_all:
REG_L a0, PT_STATUS(sp)
REG_L a2, PT_EPC(sp)
REG_SC x0, a2, PT_EPC(sp)
csrw CSR_STATUS, a0
csrw CSR_EPC, a2
REG_L x1, PT_RA(sp)
REG_L x3, PT_GP(sp)
REG_L x4, PT_TP(sp)
...
REG_L x30, PT_T5(sp)
REG_L x31, PT_T6(sp)
REG_L x2, PT_SP(sp)
#ifdef CONFIG_RISCV_M_MODE
mret
#else
sret
#endif
-
First restore the
STATUS
status register. -
Restore the
EPC
register (store the next instruction PC of the system call), because the stack context in the parent processcopy_thread()
is*childregs = *(current_pt_regs())
copied , so the child processEPC
and the parent process point to the system call return address. -
All registers except stack registers are restored in turn.
-
Restore stack registers
x2
Complete the kernel stack to user stack switch. -
Call
mret
orsret
return to the user mode, and start executing the user mode code (that is, the code corresponding to theEPC
register .
So far, the process creation process code analysis is completed.
Summarize
According to the code execution flow, this paper analyzes the process creation and its processing related to the RISC-V processor architecture, and shows the characteristics of the RISC-V processor and process implementation.