Using bash variables to remove or convert arrays

If You Use Linux, You Lose Out If You Can’t Do This #2 – Shell Variables

In the last column, I introduced the history function. In this column, I’d like to discuss converting shell variables.

Shell variables

Bash shell variables can be used for arrays and associative arrays, which use strings for indexes. The index of an array begins from 0; for an associative array, arbitrary strings designate indexes.

Array

For an array, when you specifiy NAME[*index*]=VALUE, a shell array named NAME and having VALUE as its *index*th element is automatically created. You can also define an array having multiple values with NAME=(value1 value2…). If a negative value is specified in *index*, it refers to the position from the tail end of the array (the tail end is -1).

To look up a value set in an array, use ${NAME[*index*]}. If *index* doesn’t exist, null value is displayed. If \«\*\” or \«@\” is specified for *index*, all the values in the array are displayed in a single row, separated by a character designated by the IFS (Internal Field Separator) variable (default is whitespace).
A list of the array’s *index* can be obtained with “`${!NAME[*]}“` or “`${!NAME[@]}“`. The number of elements in the array can be obtained with ${#NAME}.

To remove a specific element from a shell variable array, execute “`unset NAME[index]“`. If you execute “`unset NAME“` the entire shell variable array is deleted.

Below are usage examples.

“`
#Set an element in the 0th position of an array and display it.
$ ARRAY[0]=ABC
$ echo ${ARRAY[0]}
ABC
$ ARRAY=(abc def ghi)
# Display the tail end of the array.
$ echo ${ARRAY[-1]}
ghi
# If ARRAY is referred to, ARRAY[0] is displayed.
$ echo $ARRAY
ABC
# If an index that does not exist is specified, null value is displayed.
$ echo ${ARRAY[3]}
# A value that includes whitespace is quoted.
$ ARRAY=(abc ‘def ghi’ jkl)
$ echo ${ARRAY[1]}
def ghi
# Display all values in the array.
$ echo ${ARRAY[*]}
abc def ghi jkl
# Remove the last element of the array.
# unset ARRAY[-1]
$ echo ${ARRAY[*]}
abc def ghi
# Delete the entire array.
$ unset ARRAY
$ echo ${ARRAY[*]}
# Store the results of command ls in an array.
$ ARRAY=($(ls -1 /usr/include/net/))
$ echo ${ARRAY[*]}
ethernet.h if.h if_arp.h if_packet.h if_ppp.h if_shaper.h if_slip.h ppp-comp.h ppp_defs.h route.h
# Display the number of elements in the array.
$ echo ${#ARRAY}
10
# Display the indexes of the array and their corresponding values with a for-loop.
$ for i in ${!ARRAY[@]}; do echo $i: ${ARRAY[$i]}; done
0: ethernet.h
1: if.h
2: if_arp.h
3: if_packet.h
4: if_ppp.h
5: if_shaper.h
6: if_slip.h
7: ppp-comp.h
8: ppp_defs.h
9: route.h
“`

Associative array

An associative array in Bash has the same function as the hash structure in Perl and the dictionary structure in Python. Unlike an array where indexes are numeric, the indexes in an associate array are specified by strings.

An associative array is created with “`declare -A NAME“` or “`declare -A NAME[subscript]“`. Insert values in an associative array with “`NAME[subscript]=value“` or by using whitespace delimiter in “`NAME=([sub1]=value1… [subn]=valuen“`. To include whitespace or special characters in the value, use quotation marks as you do with an array. To look up a value in an associative array, as with an array use ${NAME[subscript]}. View the list of *subscript* of an associative array with “`${!NAME[*]}“` or “`${!NAME[@]}“`.

To remove an element in an associative array or delete the associative array itself, use unset as you do with an array.

“`
# Create an associative array.
$ declare -A HASH

# Set a value in the associative array and display subscript and value.
$ HASH[aaa]=AAA
$ echo ${!HASH[@]}
aaa
$ echo ${HASH[aaa]}
AAA

# Set multiple values in the associative array and display subscripts and values.
$ HASH=([bbb]=BBB [ccc]=CCC [def]=«DDD EEE FFF»)
$ echo ${!HASH[@]}
ccc def bbb
$ echo ${HASH[def]}
DDD EEE FFF

# Display all values in the associative array.
$ echo ${HASH[*]}
CCC DDD EEE FFF BBB
$ for h in ${!HASH[*]}; do echo $h: ${HASH[$h]}; done
ccc: CCC
def: DDD EEE FFF
bbb: BBB
“`

Remove/convert frequently used variables

I’ll now explain frequently used conversions of shell variables with the command line.

Remove matched pattern

— ${parameter#*word*}
Remove the portion that forward-matches *word*. A pattern similar to the expansion of a path name can be specified in *word*. The pattern is searched in the parameter value from the front, and the result is displayed with the shortest-matched pattern removed.

“`
$ AAA=/usr/include/linux/socket.h
$ echo $AAA
/usr/include/linux/socket.h
# Remove only the beginning /
$ echo ${AAA#/}
usr/include/linux/socket.h
# Remove from the beginning to the second component of the directory.
$ echo ${AAA#/*/*/}
linux/socket.h
“`

— ${parameter##*word*}
Similar to \«#\” in removing the forward-matched portion, except that the longest-matched pattern is removed.

“`
$ echo $AAA
/usr/include/linux/socket.h
# */include — the following is displayed.
$ echo ${AAA##*/include}
/linux/socket.h
# Display only the filename (same as basename).
$ echo ${AAA##*/}
socket.h
# Display only the filename extension.
$ echo ${AAA##*.}
h
“`

— ${parameter%*word*}
In contrast to \«#\», which forward-matches, \«%\” removes the shortest backward-matched pattern.

“`
$ echo $AAA
/usr/include/linux/socket.h
# / in the tail end does not exist, so nothing is removed.
$ echo ${AAA%/}
/usr/include/linux/socket.h
# Display the directory name (same as dirname).
$ echo ${AAA%/*}
/usr/include/linux/
# Remove linux*
$ echo ${AAA%/linux*}
/usr/include
# Remove only the filename extension.
$ echo ${AAA%.*}
/usr/include/linux/socket
“`

— ${parameter%%*word*}
Like \«##\«», which removes the longest forward-matching pattern, \««%%\” removes the longest backward-matched pattern.

“`
$ echo $AAA
/usr/include/linux/socket.h
# Display parameter before shortest-matched ‘i’ pattern.
$ echo ${AAA%i*}
/usr/include/l
# Display parameter before longest-match ‘i’ pattern.
$ echo ${AAA%%i*}
/usr/
# When /* is specified, the longest-match is the initial /, so the entire parameter is removed and nothing is displayed.
$ echo ${AAA%/*}

Convert strings

— ${parameter/*pattern*/*string*}
Replace the portion that is the longest match of *pattern* in variable parameter with *string*.

— If *pattern* is prefixed by \«/\», replace the entire parameter value with *pattern*.
— If *pattern* is prefixed by \«e#\», replace only when it matches the beginning of the parameter value.
— If *pattern* is prefixed by \«#\” \«%\», replace only when it matches the end of the parameter value.
— If *string* is an empty string, remove the portion that matches *pattern*.
— If the specified variable is an array, conversion is carried out against the specified array elements.

“`
$ BBB=abcDeFgHIjkLmnabcdef
$ echo $BBB
abcDeFgHIjkLmnabcdef
# Convert abc to ABC
$ echo ${BBB/abc/ABC}
ABCDeFgHIjkLmnabcdef
# Convert def to ABC
$ echo ${BBB/def/ABC}
abcDeFgHIjkLmnabcABC
# Convert all abc to ABC
$ echo ${BBB//abc/ABC}
ABCDeFgHIjkLmnABCdef
# Replace only the head abc with ABC
$ echo ${BBB/#abc/ABC}
ABCDeFgHIjkLmnabcdef
# Replace only the tail def with ABC
$ echo ${BBB/%def/ABC}
abcDeFgHIjkLmnabcABC
# Remove all def (DeF is not removed)
$ echo ${BBB/def/}
abcDeFgHIjkLmnabc
# Remove all d-fD-F
$ echo ${BBB//[d-fD-F]/}
abcgHIjkLmnabc
$ echo ${ARRAY[@]}
ethernet.h if.h if_arp.h if_packet.h if_ppp.h if_shaper.h if_slip.h ppp-comp.h ppp_defs.h route.h
# Convert array elements
$ echo ${ARRAY[0]/.h/.H}
ethernet. H
$ echo ${ARRAY[@]/if/IF}
ethernet.h IF.h IF_arp.h IF_packet.h IF_ppp.h IF_shaper.h IF_slip.h ppp-comp.h ppp_defs.h route.h
“`

— ${parameter^*pattern*}, ${parameter^^*pattern*}

Convert lowercase characters in the portion that matches *pattern* to uppercase. \«^\” indicates shortest match from the front, \„^^\” indicates longest match from the front.

“`
$ echo $BBB
abcDeFgHIjkLmnabcdef
# If nothing is specified in pattern, capitalize only the initial character.
$ echo ${BBB^}
AbcDeFgHIjkLmnabcdef
# If the initial character is not b, do not convert to uppercase.
$ echo ${BBB^b}
abcDeFgHIjkLmnabcdef
# If nothing is specified in pattern, convert all characters to uppercase.
$ echo ${BBB^^}
ABCDEFGHIJKLMNABCDEF
# Convert characters in range of c-f to uppercase.
$ echo ${BBB^^[c-f]}
abCDEFgHIjkLmnabCDEF
# Convert array elements
$ echo ${ARRAY[@]}
CCC DDD EEE FFF BBB
$ echo ${ARRAY[@]^}
Ethernet.h If.h If_arp.h If_packet.h If_ppp.h If_shaper.h If_slip.h Ppp-comp.h Ppp_defs.h Route.h
$ echo ${ARRAY[@]^^}
ETHERNET.H IF.H IF_ARP.H IF_PACKET.H IF_PPP.H IF_SHAPER.H IF_SLIP.H PPP-COMP.H PPP_DEFS.H ROUTE.H
“`

— ${parameter,*pattern*}, ${parameter,*pattern*}
Convert the portion that matches *pattern* from uppercase to lowercase. \“,\” indicates shortest match, \»,\” indicates longest match.

“`
$ echo $BBB
abCDEFgHIjkLmnabCDEF
$ CCC=${BBB^}
$ echo $CCC
AbCDEFgHIjkLmnabCDEF
# If nothing is specified in pattern, convert only the initial character to lowercase.
$ echo ${CCC,}
abCDEFgHIjkLmnabCDEF
# Because the initial character of BBB is lowercase, display as-is.
$ echo ${BBB,}
abCDEFgHIjkLmnabCDEF
# If pattern is omitted, convert all characters to lowercase.
$ echo ${BBB,}
abcdefghijklmnabcdef
# Convert string in c-f range to lowercase
$ echo ${BBB,[C-F]}
abcdefgHIjkLmnabcdef
# Convert elements in associative array
$ echo ${HASH[*]}
CCC DDD EEE FFF BBB
$ echo ${HASH[*],}
cCC dDD EEE FFF bBB
$ echo ${HASH[*],}
ccc ddd eee fff bbb
“`

Putting it all together

Putting it all together: array

To apply what we’ve learned about arrays and conversion, let’s write a script that reads a file and convert only the initial character of each word to uppercase and the rest of the characters to lowercase. The key points are setting IFS= whitespace as the delimiter and using read?r to read control characters as-is.

“`
# Content of file (tab character follows hij)
$ cat abc.txt
ABC
DEF
hij klMn opq
Rst
uvW
xYz
# Store the content of a file in an array (whitespace is used as the IFS)
$ declare -a LINES
$ OLDIFS=$IFS; IFS= ; cnt=0; while read -r line; do LINES[$cnt]=$(echo -e «$line»);cnt=$(($cnt+1)); done < abc.txt; IFS=$OLDIFS; unset OLDIFS cnt
# After converting each element to lowercase, capitalize only the initial character
$ for i in ${!LINES[@]}; do A=${LINES[$i],}; echo ${A^}; done; unset A
Abc
Def
Hij klmn opq
Rst
Uvw
Xyz
“`

Putting it all together: associative array

A file with key-value pairs formatted by whitespace delimiters is stored in an associate array. Here, all subscripts are lowercase characters.

“`
# Content of file (tab character is included in the Rst line)
$ cat def.txt
ABC abc
DEF DEF
hij klMn opq
Rst RSt rst
uVW UVW
xYz xyz
$ unset LINES && declare -a LINES

# Store the content of the file in an array (whitespace is used as the IFS)
$ OLDIFS=$IFS; IFS= ; cnt=0; while read -r line; do LINES[$cnt]=$(echo -e «$line»);cnt=$(($cnt+1)); done < def.txt; IFS=$OLDIFS; unset OLDIFS cnt

# For each line, extract the head portion as the subscript and the following as the value
$ unset HASH && declare -A HASH
$ for i in ${!LINES[@]}; do key=${LINES[$i]%%[ ]*}; value=${LINES[$i]#*[ ]}; HASH[${key,}]=$value; done; unset key value

# View results
$ $ for i in ${!HASH[@]}; do echo $i: ${HASH[$i]}; done |sort
abc: abc
def: DEF
hij: klMn opq
rst: RSt rst
uvw: UVW
xyz: xyz
“`

In Conclusion

As you can see, you can do all this with just Bash’s standard features. When working with arrays, it may be easier to use high-level languages (e.g. Python, Ruby, Golang, Node.js). But if your purpose is to execute commands, Bash is sufficient most of the time. So let’s make active use of it.

Part 1 – History Functions

Part 3 – Executable files and ways of using shell variables

Part 4 – How to Analyze an ELF Executable File

Part 5 – UDP Protocol

Satoru Miyazaki

PREVIOUS ARTICLE NEXT ARTICLE