๐Ÿ“– PHP Advanced URL Decoding

Reading and decoding a request URL is a vary beneficial part of helping improve the robustness of our app. If we can use the information contained in the URL beyond a simple controller/action, we can begin to add things like IDs and slugs. Having this kind of information be processed in a programmatic way increases the flexibility of our app to add destination URLs.

mysite.com/controller/action // URL with controller and action segments

Currently, our app takes in a URL with controller and action segments and parses the segments to determine which controller/action to execute using a router table. This works well with simple URLs but we want our app to be able to pass additional information about page content without going back to using query strings. For example, if our site were an blog site hosing myriad articles online, we would want a way to identify which article to load for our visitors. We could determine this from information in the URL that either matched the ID of the article in the database or with a slug of the article title.

mysite.com/articles?articleID=123 // using query strings

mysite.com/articles/show/123 // using IDs in URL

mysite.com/articles/show/all-about-php // using slugs in URL

Regular Expressions

In order to begin to extract data from the URL, we need to parse the URL using regular expressions.ย Regular expressions are pattern matching code that allows us to match patterns in the URLs. So, let's add some code to the Router.php file that will include a regular expression.

public function matchRoute(string $path): array|bool
    {
        // regular expression pattern using # delimiters
        $pattern = "#/products/show#";

        // preg_match function to compare the URL path to the regular expression pattern
        if (preg_match($pattern, $path)) {

            exit("Match");

        }

        foreach ($this->routes as $route) {

            if ($route["path"] === $path) {

                return $route["params"];

            }
        }

        return false;
    }

Notice the code above only adds this functionality to the existing matchRoute() class. This will remain in this location in our code, but will be enhanced with more functional patterns. Right now, the pattern will only match the literal /products/show pattern. We will need it to match a variety of additional patterns and return data from those patterns. With this code in place, you can test it by navigating to /products/show page in your app and see if it returns "Match".

This returns a match for our pattern, but also returns a match for all patterns that contain our pattern. Note that we are using the hashtag (#) for delimiting the regular expression pattern.

Table of Regular Expression Patterns to URL Matches
PatternURLMatch
#/products/show# /products/show Yes
#/products/show# /admin/products/show Yes

In addition to using literal string characters, the above regular expression pattern will also match any string that contains the pattern. This will match longer strings that are undesirable. To fix this, we can add code symbols to the pattern to limit the pattern match. In this case, we only want to match the exact /products/show and nothing else.

We can tell the preg_match() function where to begin the pattern and where to end it using the ^ (carrot) and $ (dollar) symbols. The ^ symbol is used to mark the beginning of the matching pattern and the $ symbol is used to mark the end. This is sometimes referred to as a word boundary.

Table of Regular Expression Patterns to URL Matches
PatternURLMatch
#^/products/show$# /products/show Yes
#^/products/show$# /admin/products/show No
public function matchRoute(string $path): array|bool
{
    // regular expression pattern
    $pattern = "#^/products/show$#";

    // preg_match function to compare the URL path to the regular expression pattern
    if (preg_match($pattern, $path)) {

        exit("Match");

    }

    foreach ($this->routes as $route) {

        if ($route["path"] === $path) {

            return $route["params"];

        }
    }

    return false;
}

Now our pattern only matches the /products/show route.

Additional Useful Regular Expression Patterns

There are many additional methods of capturing URL information using regular expressions.

Table of Regular Expression Patterns and Uses
PatternPurposeMatches
abc Match literal strings abc OR any-abc-any
^abc$ Set beginning and end of pattern abc ONLY
[abc] Character set matches a single included character Single a OR b OR c character
[a-z] Hyphen sets a range of characters Any single character in the range from a to z
[abc]* Repetition symbol Matches preceding character set 0 or more times
[abc]+ Repetition symbol Matches preceding character set 1 or more times

Now we can update our class to use a pattern of characters instead of a literal string. Here we are telling preg_match() to look first for a forward slash / then one or more characters from a to z [a-z]+, then another forward slash / then end with one or more characters from a to z [a-z]+.

// regular expression pattern
$pattern = "#^/[a-z]+/[a-z]+$#";
Table of Regular Expression Patterns to URL Matches
PatternURLMatch
#^/[a-z]+/[a-z]+$# /products/show Yes
#^/[a-z]+/[a-z]+$# /admin/products/show No
#^/[a-z]+/[a-z]+$# /products/show1 No

Retrieve URL Segments

Now that we have a pattern for matching our URLs, we want to be able to access each of the segments. We can do this within the regular expression using segment capture.

Theย preg_match() function has a third argument available called matches. If used, it returns an array with the value(s) of the segment matches beginning with the text that matched the entire pattern. To test this new implementation of preg_match(), add the following code to the matchRoute() class and refresh your browser.

$pattern = "#^/[a-z]+/[a-z]+$#";

if (preg_match($pattern, $path, $matches)) {

    // check the output in the browser
    print_r($matches);

    exit("Match");
}

This will return Array ( [0] => /products/show ) Match to the browser. We could then use the array value to explode the path as we are currently doing to find the path segments, but there's a better way.

Using Capture Groups

Capture groups in regular expressions are used to segment out the different parts of the matching pattern and add it to the matches array. We can add these to the regular expression pattern to tell the code to segment these out of the match string. It's as simple as adding parenthesis around the pattern segments to mark them for access.

// regular expression pattern with capture groups
$pattern = "#^/([a-z]+)/([a-z]+)$#";

With this added to the code, our browser now outputs Array ( [0] => /products/show [1] => products [2] => show ) Match for the URL /products/show.

Naming Capture Groups

You can also name each of the capture groups within the regular expression so that those segments are added to the matches array as associative array elements as name=>value pairs. To do this, add a question mark followed by the name assigned to the capture group within <> brackets.

// regular expression pattern with named capture groups
$pattern = "#^/(?<controller>[a-z]+)/(?<action>[a-z]+)$#";
Array
(
    [0] => /products/show
    [controller] => products
    [1] => products
    [action] => show
    [2] => show
)
Match

Notice that our array now includes the [controller] => products and [action] => show array elements. This is very close to what we need to match the route. We really only need the associative elements. To remove the index keys, we can use the PHPย array_filter() function.

// regular expression pattern with named capture groups
$pattern = "#^/(?<controller>[a-z]+)/(?<action>[a-z]+)$#";


if (preg_match($pattern, $path, $matches)) {

    // filter the $matches array for only keys that are strings
    $matches = array_filter($matches, "is_string", ARRAY_FILTER_USE_KEY);

    // check the output in the browser
    print_r($matches);

    exit("Match");
}

Now we only see the associative array with the URL named segments.

Array 
( 
    [controller] => products 
    [action] => show 
) 
Match