In the previous column, I explained Linux’s executable file format, ELF (Executable and Linkable Format). In this installment, I’d like to present ways of analyzing an ELF executable file.
Analyzing an ELF file tells you the libraries and functions used by the file.
Finding out the libraries used
There are several methods to find out the libraries used as dynamic links by an executable file in ELF format.
ldd
Tells you the libraries used by the file name specified on the command line.
```
$ ldd /bin/ls
linux-vdso.so.1 (0x00007ffff52fb000)
libselinux.so.1 => /lib/x86_64-linux-gnu/libselinux.so.1 (0x00007f6c65b30000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f6c65730000)
libpcre.so.3 => /lib/x86_64-linux-gnu/libpcre.so.3 (0x00007f6c654b0000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f6c652a0000)
/lib64/ld-linux-x86-64.so.2 (0x00007f6c66000000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f6c65080000)
```
If a library can’t be found, a result like below is displayed.
```
$ ldd bin/ls
linux-vdso.so.1 (0x00007ffd789d4000)
libselinux.so.1 => /lib/x86_64-linux-gnu/libselinux.so.1 (0x00007ff9de4a4000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff9de0b3000)
libpcre.so.3 => not found
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ff9ddeaf000)
/lib64/ld-linux-x86-64.so.2 (0x00007ff9de8ee000)
```
pldd
With this command, you specify the process ID and view the libraries used by the process in its operation. The following example shows the libraries used by bash.
```
$ pldd $$
2997: /usr/bin/bash
linux-vdso.so.1
/lib64/libtinfo.so.5
/lib64/libdl.so.2
/lib64/libc.so.6
/lib64/ld-linux-x86-64.so.2
/lib64/libnss_files.so.2
```
Specifying libraries
The usual paths of libraries (paths for searching for libraries) are directories specified in /etc/ld.so.conf.d/\*.conf and in directories /lib or /lib64. When you wish to add a library path, write the path in /etc/ld.so.conf.d/\*.conf and execute ldconfig. If you wish to add a path just temporarily, you can specify the library’s directory name in the arguments of ldconfig.
You can also change the library paths and libraries to be used with environmental variables.
– LD_LIBRARY_PATH
You can change the library paths by specifying them with this environmental variable, in the same manner as PATH, using \”:\” to separate directory names. This environmental variable allows non-root users to temporarily change the library paths.
– LD_PRELOAD
You can specify the shared libraries to load first, separating them with \”:\”. By loading a shared library before another shared library that will be read from a library path, you can overwrite standard functions. Although risky, you can also use this environmental variable to load malloc() functions and to overwrite standard functions for debugging purposes. Malloc functions that overwrite include dmalloc for debugging and [tcmalloc] ( https://gperftools.github.io/gperftools/tcmalloc.html) and [jemalloc] (http://jemalloc.net/), which are faster than standard malloc.
Finding out about symbols included in an ELF executable file or library
The symbol below refers to function names included in an ELF executable file or a library and debug information. Debug information are not added if that option is not specified at the time of compilation. Information about symbols is also usually removed from ELF executable files and libraries contained in a package with the strip command. Whether or not symbol information has been stripped can be determined with the file command. “Stripped” or “not stripped” appears at the end of this command’s output.
```
$ file /bin/ls
/bin/ls: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/l, for GNU/Linux 3.2.0, BuildID[sha1]=9567f9a28e66f4d7ec4baf31cfbf68d0410f0ae6, stripped
$ file /usr/bin/docker
/usr/bin/docker: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/l, for GNU/Linux 3.2.0, BuildID[sha1]=cc7bf5bb273b7fdca63e88082b3fc5248e8373e3, with debug_info, not stripped
```
nm
By executing nm filename, information about symbols included in the file is displayed. The following shows a portion of symbol information from tmux, which I compiled myself.
```
$ nm /usr/local/bin/tmux
…
0000000000012aa0 T cfg_print_causes
0000000000012b20 T cfg_show_causes
00000000000120f0 t cfg_show_causes.part.2
U cfgetispeed@@GLIBC_2.2.5
U cfgetospeed@@GLIBC_2.2.5
U cfmakeraw@@GLIBC_2.2.5
U cfsetispeed@@GLIBC_2.2.5
U cfsetospeed@@GLIBC_2.2.5
U chdir@@GLIBC_2.2.5
000000000003ce30 T check_window_name
0000000000054a80 t checkshell
U chmod@@GLIBC_2.2.5
0000000000297a84 b client_attached
0000000000013210 t client_dispatch
…
```
The items \”Value\”, \”Class\”, \”Name\” are shown in order. The addition of @@GLIBC_2.2.5 means that GLIBC2.2.5 functions are being called.
The meaning of Class is as follows. Lowercase indicates a local symbol; uppercase indicates a global symbol.
Class | Explanation |
a/A | absolute symbol |
b/B | bss (that is, uninitialized data space)symbol |
C | common symbol |
d/D | data(this is, initialized data space) symbol |
I | indirect reference to other symbols |
N | debugging symbol |
r/R | read only data section |
t/T | text symbol |
U | undefined symbol |
v/V | weak symbol |
w/W | weak symbol(tagged) |
? | unknown symbol |
Below is an example of output in posix format. We see that the items Type and Size are now added.
```
$ nm -f posix /usr/local/bin/tmux
Symbols from /usr/local/bin/tmux:
Name | Value | Class | Type | Size | Line | Section |
cfsetispeed@@GLIBC_2.2.5 | U | FUNC | *UND* | |||
cfsetospeed@@GLIBC_2.2.5 | U | FUNC | *UND* | |||
chdir@@GLIBC_2.2.5 | U | FUNC | *UND* | |||
check_window_name | 000000000003ce30 | T | FUNC | 0000000000000273 | .text | |
checkshell | 0000000000054a80 | t | FUNC | 0000000000000035 | .text | |
chmod@@GLIBC_2.2.5 | U | FUNC | *UND* | |||
client_attached | 0000000000297a84 | b | OBJECT | 0000000000000004 | .bss | |
client_dispatch | 0000000000013210 | t | FUNC | 0000000000000334 | .text |
```
objdump
To use the nm command, information about symbols must be included. With the objdump command, you can get information even if symbol information is not included in the ELF executable file. This command is often used to find out which functions are used in the file.
The command below specifies only the display of the \”.plt\” section (this section contains the Procedure Linkage Table) with option –j. –S specifies displaying disassembled source code. jmpq is an assemble instruction used when calling functions.
```$ objdump -j .plt -S /bin/ls | grep ‘\# ‘
3770: ff 35 ca c4 21 00 pushq 0x21c4ca(%rip) # 21fc40
3776: ff 25 cc c4 21 00 jmpq *0x21c4cc(%rip) # 21fc48
…
39f0: ff 25 92 c3 21 00 jmpq *0x21c392(%rip) # 21fd88
3a00: ff 25 8a c3 21 00 jmpq *0x21c38a(%rip) # 21fd90
3a10: ff 25 82 c3 21 00 jmpq *0x21c382(%rip) # 21fd98
3a20: ff 25 7a c3 21 00 jmpq *0x21c37a(%rip) # 21fda0
…
```
Understanding what happens when an ELF executable file is run
Until now, we’ve looked at static commands for the ELF executable file format. We can also find out the conditions when an ELF file is running.
strace
The strace command displays the system calls when a command is executed. You specify the command and its arguments following strace. The system calls and their arguments and returned values are displayed on standard error output.
The following is an example of strace running the ls command.
```
$ strace ls -a /var/tmp
execve(“/bin/ls”, [“ls”, “-a”, “/var/tmp”], 0x7ffff59275d0 /* 38 vars */) = 0
brk(NULL) = 0x7fffd3a1d000
access(“/etc/ld.so.nohwcap”, F_OK) = -1 ENOENT (No such file or directory)
access(“/etc/ld.so.preload”, R_OK) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, “/etc/ld.so.cache”, O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=66248, …}) = 0
mmap(NULL, 66248, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fca28571000
close(3) = 0
access(“/etc/ld.so.nohwcap”, F_OK) = -1 ENOENT (No such file or directory)
…
stat(“/var/tmp”, {st_mode=S_IFDIR|S_ISVTX|0777, st_size=4096, …}) = 0
openat(AT_FDCWD, “/var/tmp”, O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3
fstat(3, {st_mode=S_IFDIR|S_ISVTX|0777, st_size=4096, …}) = 0
getdents(3, /* 2 entries */, 32768) = 48
getdents(3, /* 0 entries */, 32768) = 0
close(3) = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), …}) = 0
write(1, “. ..\n”, 6. ..
) = 6
close(1) = 0
close(2) = 0
exit_group(0) = ?
+++ exited with 0 +++
```
From the above, we see that processing took place in the following order:
1. execve is executed.
2. ld.so.cache and the necessary libraries are read.
3. The /var/tmp directory is opened by openat ().
4. The content of the file is read using fstat.
5. The file is closed.
6. write () outputs the results to 1 (=stdout)
If /var/tmp is empty, the processing load is light, so try it out strace. Find out the content of each system call with its man command.
When you execute strace –p process ID, system calls for the specified process ID are displayed. This method is useful when debugging.
ltrace
Just as strace displays system calls, ltrace displays the functions, arguments, and results of the executed command in the standard error output. ltrace does not reveal from which libraries functions are called.
The example below is an example of ltrace executed in the same manner as strace.
```
$ ltrace /bin/ls -a /var/tmp/
strrchr(“/bin/ls”, ‘/’) = “/ls”
setlocale(LC_ALL, “”) = “ja_JP.UTF-8”
bindtextdomain(“coreutils”, “/usr/share/locale”) = “/usr/share/locale”
textdomain(“coreutils”) = “coreutils”
__cxa_atexit(0x7f4c0e40cca0, 0, 0x7f4c0e620008, 1) = 0
…
opendir(“/var/tmp/”) = 0x7fffcb78f640
readdir(0x7fffcb78f640) = 0x7fffcb78f670
__errno_location() = 0x7f4c0e3d0ed8
__ctype_get_mb_cur_max() = 6
strlen(“.”) = 1
strlen(“.”) = 1
memcpy(0x7fffcb78f5b0, “.\0”, 2) = 0x7fffcb78f5b0
readdir(0x7fffcb78f640) = 0x7fffcb78f688
__errno_location() = 0x7f4c0e3d0ed8
__ctype_get_mb_cur_max() = 6
strlen(“..”) = 2
strlen(“..”) = 2
memcpy(0x7fffcb797680, “..\0”, 3) = 0x7fffcb797680
readdir(0x7fffcb78f640) = 0
closedir(0x7fffcb78f640) = 0
_setjmp(0x7f4c0e620300, 0, 0x7fffcb78f5e0, 0x7fffcb78a910) = 0
__errno_location() = 0x7f4c0e3d0ed8
strcoll(“.”, “..”) = -1
realloc(0, 96) = 0x7fffcb78f640
strlen(“.”) = 1
…
fflush(0x7f4c0ddbc680) = 0
fclose(0x7f4c0ddbc680) = 0
+++ exited (status 0) +++
```
As you can see, because ls is written in C, it contains many C standard functions. You may also find it interesting to compare the results of strace and ltrace.
Like strace, when you execute ltrace –p process ID, the function calls for the specified process will be displayed.
In conclusion
Right now, much of software is being developed in languages other than C and C++. So I recommend debugging in an integrated development environment (IDE). But don’t forget that even now, many applications, including GNU userland programs, are written and distributed in the C language. So it’s important to know how ELF executable files work in a distributed package and what libraries are needed. I hope this column will be helpful when you face this situation.