---- Perl One-Liners ----

Perl Introduction

  1. Perl basics
    Knowledge of a few basic concepts in Perl will lead to a better comprehension of the instructions contained in the one-liners below. It will also allow the user to modify the code and make adjustments according to individual needs. This appendix explains some basic concepts of Perl that are used in the Methods section.
  2. Perl variables
    A Perl command can contain several elements, such as variables, operators, built-in functions and key words. Variables provide storage containers for data and come in three types: scalars (e.g. numbers, letters or strings of characters), arrays (lists of scalars) and hashes (lists of scalars organised into key-value pairs). Very complex constructs are possible but to keep it simple only the most basic aspects are presented here. Each variable is given a name that starts with a symbol ($ for scalars, @ for arrays and % for hashes) and is followed by alphanumerical characters (a-Z, 0-9), including the underscore. Below are some simple examples of assigning and accessing variables:
    $attempt = 1;
    
    $date = '11/12/2013';
    
    print "attempt $attempt on $date\n";
    
    @elements = ('CDS', 'mRNA', 'tRNA');
    
    print "First element: $elements[0]\n";
    
    %roman = (1, 'I', 2, 'II', 3, 'III');
    
    print "Roman for 3: $roman{3}\n";
    
    An easy way to try out Perl code is the debugger. It can be started by typing 'perl -d -e 42' at the command line. This will give a new prompt ('DB<1>') after which Perl statements can be typed for testing. The debugger provides extra functionality, for example examining the content of variables, which can be particularly useful for beginners.
    Figure 1: Perl Debugger
    The Perl debugger was started within a Terminal window on Mac OS X. An array '@h' is created and examined. Then the debugger is quit and the normal command prompt '$' re-appears.

    A couple of rules are worth noting from the lines above:

    1. Perl statements end with a semi-colon.
    2. Value assignments happen from right to left, i.e. the value to be assigned is on the right hand side of the equal sign.
    3. Text needs to be enclosed in double or single quotes.
    4. Variables and special characters (e.g. '\n') or evaluated within double quotes but not single quotes.
    5. Lists are enclosed in round brackets with elements separated by comma.
    6. List indices start at position zero.
    7. To access a single element of an array, the symbol at the start of the variable changes to '$' and the index is specified in square brackets, '[]'.
    8. To access a specific value in a hash, the symbol at the start of the variable changes to '$' and the lookup key is specified in curly brackets, '{}'.
  3. Perl operators
    The next lines of code demonstrate some example use of operators in Perl (some comments are added, starting with '#'):
    # some standard mathematical operations
    print 3 * (5 + 10) - 2**4;     
    
    # processing the content of variables
    $total_error = $false_positive + $false_negative;    
    
    # increase value in variable $minutes by 30
    $minutes += 30;       
    
    # increase value in variable $hour by one
    $hour++;              
    
    # decrease by one
    $remaining--;         
    
    # repeat 'CG' 12 times
    $motif = 'CG' x 12;   
    
    # the dot concatenates strings and content of variables
    $chr = 'chr' . $roman{$chr_number};   
    
    # two dots create lists by expanding from lower to higher border
    @hex = (1..9, a..f);  
    
  4. Perl functions
    Perl provides many functions that can be applied to the different variable types. A few are listed below and shown with examples:
    # functions for scalars
    $seq_len = length($seq);
    $rev_seq = reverse($seq);
    $upper_case = uc($seq);
    $lower_case = lc($seq);
    $codon = substr $seq, 0, 3;
    
    # remove white-space from end of line
    chomp $input_line;            
    
    # functions for arrays
    @array = split //, $string;
    $first_element = shift @array;
    $last_element = pop @array;
    unshift @array, $first_element;
    push @array, $last_element;
    @alphabetically_sorted = sort @names;
    @numerically_sorted = sort { $a <=> $b } @values;
    
    # functions for hashes
    if (defined $description{$gene}) { print $description{$gene} } else { print 'not available'; }
    foreach (keys %headers) { print ">$_\n$headers{$_}\n"; }
    
  5. Loops and branches
    The last two examples introduced the concept of loops and branches. These operate on lists and Boolean expressions, respectively. A loop is carried out for each element in a list and an if-statement is executed if a test condition is true. Any Perl statement that evaluates to something different to 0 or an empty string is considered true. For tests comparators are available, such as '>', '<', '==', '>=', '<=' for numbers and 'gt', 'lt', 'eq' for characters. A common mistake is to use just a single equal sign to check if two variables are equal. In such cases a double equal sign needs to be used to distinguish the comparison from an assignment. See below for examples.
    # a progress meter for reading in long files:
    if ($line % 1000 == 0) { print STDERR " $line "; } 
    
    # collect lines of sequence into one long lower-case string:
    while (<>) { chomp; $seq .= lc $_; }
    
    # exact motif search
    if (substr($seq, $pos, 10) eq $motif) { print "Motif found at position $pos!\n"; }
    
    # pad number with zeros at the front
    $num = '0'.$num until (length($num) >= $max_len);
    
    The line 'while (<>) {}' is a special Perl construct that reads line by line from standard input and stores each line in the special variable '$_'. A file name specified on the command line would be automatically opened by the shell and fed into the Perl program.
  6. Regular expressions
    One of the most powerful features of Perl is its implementation of regular expressions, which allow matching not only exact text strings but also variable classes of text. Whole books have been written about this topic and a full explanation would go beyond the scope of this article. Therefore, only a few basic concepts are explained and demonstrated in form of examples.

    Regular expressions are specified within delimiters ('/' by default) and applied to the content of a variable with the '=~' operator. If a second expression is provided, then the first pattern will be replaced with the second. In addition, modifiers can be used, such as 'i' for case-insensitive matches and 'g' for global matches, instead of just the first one.

    Special characters are available to match groups of characters, such as '\w' for any alphanumerical character, '\d' for numbers, and '\s' for white space. The negated class, e.g. not a digit, can be accessed through capital letters, such as '\D', '\W', and '\S'.

    Occurrences can be specified through numbers in curly brackets, e.g. '{3}' for exactly three, or '{4,10}' for four to 10, or '{2,}' for two or more occurrences of a pattern. Special cases are '+' for one or more matches, '*' for zero or more matches, and '?' for zero or one match.

    To refer to the matched patterns afterwards, round brackets are used and the special variables $1, $2, ..., depending on how many patterns are specified.

    The following examples illustrate their usage:

    # search $_ for the word regulator (ignoring case) and print if found
    if (/regulator/i) { print;}
    
    # check for non-numerical input
    if ($input =~ /\D/) { warn "Non-numerical input in '$input'\n"; }
    
    # remove all white space
    $input =~ s/\s//g;
    
    # find a pattern that is repeated at least 3 times and print
    if ($input =~ /(CG{3,})/) { print "Found pattern $1!\n"; }
    
    # split a string at tabulators and collect the elements in an array
    @list = split /\t/, $input;
    
There is plenty of literature available for more information on learning Perl. A good starting point is the online library at perl.org: http://www.perl.org/books/library.html.