Monday, July 25, 2011

Encoding URL Strings with Escaped Codes with Perl

You've probably seen websites that when you do searches or submit something wherein some of the characters in the URL get replaced with %NN where NN is a two-digit Hex number. The reason being is that there are 22 characters that have special meaning in a URL string, so they have to be replaced with a Hex value equivalent.

In writing an API test harness, I had to include the username and password to send to a site for testing purposes and I started running across passwords that had some of these "illegal" characters. To allow my script to log in correctly, I had to write a routine which would escape those characters but still send along the equivalent to the website.

Here's the Perl subroutine I wrote to deal with this problem:

############################## url_escape #############################
#                                                                     #
# Send in a string; have the illegal URL characters replaced.         #
#                                                                     #
# Author:        Greg Meece                                           #
# Creation Date: 07/25/2011                                           #
# Last Mod Date: 07/25/2011                                           #
# ------------------------------------------------------------------- #
#                                                                     #
# input: URL string                                                   #
#                                                                     #
# Example usage:                                                      #
# my $urlStr = "H%t^@=";                                              #
# url_escape($urlStr);                                                #
# returns: HTML escaped string - 'H%25t%5E%40%3D'                     #
#                                                                     #
#######################################################################
sub url_escape
{
  my ($inString) = @_;
  my $htmlChar;
 
  # These are all the illegal characters for URL strings:
  my %urlEscapers =
  (
   # "%" => "%25",
   " " => "%20",
   "<" => "%3C",
   ">" => "%3E",
   "#" => "%23",
   "\{" => "%7B",
   "}" => "%7D",
   "|" => "%7C",
   "\\" => "%5C",
   "^" => "%5E",
   "~" => "%7E",
   "[" => "%5B",
   "]" => "%5D",
   "`" => "%60",
   ";" => "%3B",
   "/" => "%2F",
   "?" => "%3F",
   ":" => "%3A",
   "@" => "%40",
   "=" => "%3D",
   "&" => "%26",
   "\$" => "%24"
  );
  # Replace '%' first before iterating through the hash
  $inString =~ s/%/%25/;
 
  # Iterate through hash with no RegEx metacharacters
  foreach $htmlChar (keys %urlEscapers)
  {
    $inString =~ s/\Q$htmlChar\E/$urlEscapers{$htmlChar}/g;
  }
 
  return($inString);
}

No comments:

Post a Comment