Extracting Substrings On Linux | Network World| ItSoftNews

Puppeteer hands manipulating strings” itemprop=”contentUrl” />

There are many ways to extract substrings from lines of text using Linux and doing so can be extremely useful when preparing scripts that may be used to process large amounts of data. This post describes ways you can take advantage of the commands that make extracting substrings easy.

Using bash parameter expansion

When using bash parameter expansion, you can specify the starting and ending positions for the text that you want to extract. For example, you can create a variable by assigning it a value and then use syntax like that shown below to select a portion of it.

$ string="Happy days are here again" $ echo ${string:1:10} appy days $ echo ${string:0:9} Happy days 

Note that the example above makes it clear that this technique starts position numbering at 0. So, in the next example, the 7 represents the eighth character in the string and the -2 means to drop the last 2 characters. As a result, the substring in the first example below has a single character and the second has all but the last two.

$ string="1234567890" $ echo ${string:7:-2} 8 $ echo ${string:0:-2} 12345678 

In this next example, we first create a variable using “set –” and then use echo to display the eighth and ninth characters. In other words, it starts with the eighth character (7) and then displays two characters.

$ set -- 01234567890abcdef $ echo ${1:7:2} 78 

NOTE: You could display the string created with the set command by simply using the command “echo $1”. This is what is referenced by the “1” in the example above.

$ set -- 01234567890abcdef $ echo $1 01234567890abcdef 

Using cut

The cut command can be used in several ways to yank substrings from text. The -c option allows you to select the character positions to be displayed. For cut, character numbering starts at 1.

$ echo "12345" | cut -c 1-3 123 

In this next example, we select the last two words by character position. If you select more characters than are available, it doesn’t affect the output.

$ echo "Have some fun" | cut -c 6-13 some fun $ cut -c 6-13 <<< "Have some fun" some fun $ echo "Have some fun" | cut -c 6-20 some fun 

In addition, you can pipe text to the cut command or use the cut command to work with text in a file. Just be sure that the positions work for every line.

$ cat myfile                        $ cut -c 6-15 myfile Have some fun                       some fun Grab your lunch                     your lunch Take nice nap                       nice nap 

The cut command can also work with delimiters and this often makes it a lot easier to use with files in which the words or fields don’t line up precisely. To work with a file of mailing addresses, for example, you could do this to pull out the third field in the comma-separated addresses:

$ cat addresses                     $ cut -d, -f3 addresses 6803 Gravel Road,Hurlock,MD         MD 121 Blueberry Drive,Outback,VA      VA 1427 N 12th Street,Reading,PA       PA 2001 Turtle Road,Baker,WV           WV 264 Dakota Street,Groton,CT         CT 111 Mindless Circle,Celery,TX       TX 1089 Plymouth Drive,Rahway,NJ       NJ 949 Endless Lane,Hoboken,NJ         NJ 2001 Turtle Road,Outback,VA         VA 

You can select multiple fields by specifying a range (e.g., “2-3”) or a sequence (e.g., “2,3”) as shown below.

$ cut -d, -f2-3 addresses           $ cut -d, -f2,3 addresses Hurlock,MD                          Hurlock,MD Outback,VA                          Outback,VA Reading,PA                          Reading,PA Baker,WV                            Baker,WV Groton,CT                           Groton,CT Celery,TX                           Celery,TX Rahway,NJ                           Rahway,NJ Hoboken,NJ                          Hoboken,NJ Outback,VA                          Outback,VA 

Using awk

The awk command can also be used to extract substrings. Here’s an example of pulling text from a supplied phrase:

$ awk '{print substr($0,6,8)}' <<< "Wash your car" your car 

The $0 represents the complete phrase.

To work with a file with delimited fields, use the -F (field delimiter) option. In this case, the delimiter is a comma. Use -F’:’ if the file is colon-delimited.

$ awk -F',' '{print $3}' addresses | sort | uniq CT MD NJ PA TX VA WV 

If your fields are separated with both a comma and a space, that is no problem for awk. Just specify that in the command like this:

$ awk -F', ' '{print $3}' addresses | sort | uniq CT MD NJ PA TX VA WV 

In fact, if you want the awk command to work regardless of whether fields are separated with just commas or both commas and blanks, you can do this:

$ awk -F', ?' '{print $3}' addresses | sort | uniq CT MD NJ PA TX VA WV 

Using awk, you can also display two fields by using syntax like this:

$ awk -F',' '{print $2,$3}' addresses | sort | uniq Baker WV Celery TX Groton CT Hoboken NJ Hurlock MD Outback VA Rahway NJ Reading PA

Using expr

To use the expr command, type “expr substr” followed by your string, the start position and the string length.

$ expr substr "Have some fun" 6 8 some fun 
$ str="Have some fun" $ expr substr "$str" 6 8 some fun

Wrap-Up

There are lots of ways to extract substrings on Linux, but each of the commands you might use has its own quirks and its own advantages.

Leave a Reply

Your email address will not be published. Required fields are marked *