First, before explaining the problem, share a little basic knowledge: the linux kernel is an operating system kernel, and the operating system itself is also a program. A user-space program is also a program. The title "How the Linux Kernel Launches Userspace Programs" may seem a bit misleading, but it seems innocuous. As long as we understand that the program is recognized and executed by the CPU, haha.

Xiaosheng has two articles in the front:
" How the linux kernel starts a user space process [01] "
" How the linux kernel starts a user space process [2] "
It also describes how the linux kernel starts the user space process. The content of the article is based on the later operation of the start_kernel() function of the linux kernel. But the text will have a different flavor from the shell's point of view.
From the user's point of view, there are several ways to launch an application. For example: the program can be run from the shell or by double-clicking the application icon. No matter which way an application is started, the linux kernel will handle the application's startup process.
In this article, we will consider ways to launch applications from the shell. The standard way to start an application from the shell is as follows: start a terminal emulator application, type the application name, and pass (or not) arguments to the application, for example:
./demo
Next, let's think about what happens to the linux kernel when an application is launched from the shell? What does the shell do when we write the program name? What does the linux kernel do?
My default shell is bash
, so this article will consider how the bash shell starts programs. The bash shell, and the programs we write in C, main()
start with functions to perform specific operations. If we look at the source code of the bash shell, we will shell.c
find the main() function in the source code file, the code is as follows (the code is longer and has been cut):
#if defined (NO_MAIN_ENV_ARG)
/* systems without third argument to main() */
int
main (argc, argv)
int argc;
char **argv;
#else /* !NO_MAIN_ENV_ARG */
int
main (argc, argv, env)
int argc;
char **argv, **env;
#endif /* !NO_MAIN_ENV_ARG */
{
register int i;
int code, old_errexit_flag;
#if defined (RESTRICTED_SHELL)
int saverst;
#endif
volatile int locally_skip_execution;
volatile int arg_index, top_level_arg_index;
#ifdef __OPENNT
char **env;
env = environ;
#endif /* __OPENNT */
USE_VAR(argc);
USE_VAR(argv);
USE_VAR(env);
USE_VAR(code);
USE_VAR(old_errexit_flag);
#if defined (RESTRICTED_SHELL)
USE_VAR(saverst);
#endif
/* 省略大量代码 */
/* 省略大量代码 */
/* 省略大量代码 */
/* Do the things that should be done only for interactive shells. */
if (interactive_shell)
{
/* Set up for checking for presence of mail. */
reset_mail_timer ();
init_mail_dates ();
#if defined (HISTORY)
/* Initialize the interactive history stuff. */
bash_initialize_history ();
/* Don't load the history from the history file if we've already
saved some lines in this session (e.g., by putting `history -s xx'
into one of the startup files). */
if (shell_initialized == 0 && history_lines_this_session == 0)
load_history ();
#endif /* HISTORY */
/* Initialize terminal state for interactive shells after the
.bash_profile and .bashrc are interpreted. */
get_tty_state ();
}
#if !defined (ONESHOT)
read_and_execute:
#endif /* !ONESHOT */
shell_initialized = 1;
if (pretty_print_mode && interactive_shell)
{
internal_warning (_("pretty-printing mode ignored in interactive shells"));
pretty_print_mode = 0;
}
if (pretty_print_mode)
exit_shell (pretty_print_loop ());
/* Read commands until exit condition. */
reader_loop ();
exit_shell (last_command_exit_value);
}
Before bash's main thread loop starts working, this function does a number of things:
(1) Check and try to open /dev/tty
.
(2) Check the shell running in debug mode.
(3) Parse the parameters passed from the command line.
(4) Read shell
environment variables.
(5) Loading .bashrc
, .profile
and other configuration files.
After the above operation, you will be able to see reader_loop()
the function call. This function is defined in the (eval.c) source code file and it represents the main thread loop, all in all, this function will read and execute commands, as follows:
int reader_loop ()
{
int our_indirection_level;
COMMAND * volatile current_command;
USE_VAR(current_command);
current_command = (COMMAND *)NULL;
our_indirection_level = ++indirection_level;
if (just_one_command)
reset_readahead_token ();
while (EOF_Reached == 0)
{
int code;
code = setjmp_nosigs (top_level);
#if defined (PROCESS_SUBSTITUTION)
unlink_fifo_list ();
#endif /* PROCESS_SUBSTITUTION */
/* XXX - why do we set this every time through the loop? And why do
it if SIGINT is trapped in an interactive shell? */
if (interactive_shell && signal_is_ignored (SIGINT) == 0 && signal_is_trapped (SIGINT) == 0)
set_signal_handler (SIGINT, sigint_sighandler);
if (code != NOT_JUMPED)
{
indirection_level = our_indirection_level;
switch (code)
{
/* Some kind of throw to top_level has occurred. */
case FORCE_EOF:
case ERREXIT:
case EXITPROG:
current_command = (COMMAND *)NULL;
if (exit_immediately_on_error)
variable_context = 0; /* not in a function */
EOF_Reached = EOF;
goto exec_done;
case DISCARD:
/* Make sure the exit status is reset to a non-zero value, but
leave existing non-zero values (e.g., > 128 on signal)
alone. */
if (last_command_exit_value == 0)
set_exit_status (EXECUTION_FAILURE);
if (subshell_environment)
{
current_command = (COMMAND *)NULL;
EOF_Reached = EOF;
goto exec_done;
}
/* Obstack free command elements, etc. */
if (current_command)
{
dispose_command (current_command);
current_command = (COMMAND *)NULL;
}
restore_sigmask ();
break;
default:
command_error ("reader_loop", CMDERR_BADJUMP, code, 0);
}
}
executing = 0;
if (temporary_env)
dispose_used_env_vars ();
#if (defined (ultrix) && defined (mips)) || defined (C_ALLOCA)
/* Attempt to reclaim memory allocated with alloca (). */
(void) alloca (0);
#endif
if (read_command () == 0)
{
if (interactive_shell == 0 && read_but_dont_execute)
{
set_exit_status (EXECUTION_SUCCESS);
dispose_command (global_command);
global_command = (COMMAND *)NULL;
}
else if (current_command = global_command)
{
global_command = (COMMAND *)NULL;
if (interactive && ps0_prompt)
{
char *ps0_string;
ps0_string = decode_prompt_string (ps0_prompt);
if (ps0_string && *ps0_string)
{
fprintf (stderr, "%s", ps0_string);
fflush (stderr);
}
free (ps0_string);
}
current_command_number++;
executing = 1;
stdin_redir = 0;
execute_command (current_command);
exec_done:
QUIT;
if (current_command)
{
dispose_command (current_command);
current_command = (COMMAND *)NULL;
}
}
}
else
{
/* Parse error, maybe discard rest of stream if not interactive. */
if (interactive == 0)
EOF_Reached = EOF;
}
if (just_one_command)
EOF_Reached = EOF;
}
indirection_level--;
return (last_command_exit_value);
}
When the reader_loop function checks and reads the given program name and arguments, it calls the execute_command()
function from the (execute_cmd.c) source code file. The function call chain of the execute_command function is as follows:
execute_command
--> execute_command_internal
----> execute_simple_command
------> execute_disk_command
--------> shell_execve
At the end of the process, the shell_execve()
function calls the execve()
system call:
execve (command, args, env);
The prototype of the execve system call function is as follows:
int execve(const char *filename, char *const argv [], char *const envp[]);
And execute the program according to the given file name, parameters and environment variables. Here, this system call is the first and only time.
It can be seen from the above analysis that the running process of a user application (the text is bash) will finally enter the linux system call, the next step is the linux kernel operation, and the entry of this part of the operation is the execv()
system call. The essence of the execv system call is to call a do_execveat_common()
function (for details, please refer to the article " How the Linux Kernel Starts User Space Process [2] ")
It can be seen that it is back to do_execveat_common (). interesting!