Teachings of a Samurai Engineer 4: Email Authentication Using Tokens

In the last column, we used the combination of the functions random_bytes() and [bin2hex|base64_encode]() to create tokens. What are tokens used for?

Tokens are often used as persistent secret strings and secret strings that are used once (one-time tokens). In short, a token is used when you want a string that is hard to guess.

“Hard-to-guess strings” often come up when talking about preventing cross-site request forgeries (CSRF). There are also a variety of scenarios in which (one-time) tokens are used.

In this column, let’s take a look at an example of implementing email activation (authentication) with tokens.

Suppose that the user clicks “Subscribe to Newsletter” on a website.

But what the subscription service provider really wants to know is not “Is the email address format correct?” but rather, “Can this email address receive email?” and “Does this email address belong to the subscriber who entered the address?”

First, simply trusting the entered email address as-is is a no-go for the requirement specs. The email address is no good if the user intentionally entered a false address. Or, there may be a typo in the address.

In the early days, Japanese mobile phone services used to do the following:

Ask the user to send an “empty email” (email without subject or body) to a particular address.
The empty email is received by the service’s server, which sends an email containing a URL to the user (the user’s email address is obtained from the From line).
The user clicks the sent URL to continue member registration.

These days, the following registration flow is often used to create subscriber accounts:

The user completes member registration using a web form. (status: email address unconfirmed.)
Email containing a URL (or token) for account activation is sent to the entered email address. (status: token and email address/user ID linked.)
The user activates his or her account by using the sent URL or token. (status: email address confirmed.)

Allow me to digress for a moment.

There is a tradeoff between security and convenience with a URL that includes a token and with just writing a token.

The URL is easier for the user to understand and operate on. But having the user click a URL in an email runs counter to best practices for users to protect themselves against phishing attacks.

Writing a token, on the other hand, is compatible with counter-phishing practices. But users (especially smartphone users) may find it inconvenient to use.

Because of this tradeoff, you should regularly observe and modify the method you choose as needed.

Returning to our topic, in any case, a token is needed for both approaches above. I wrote about methods to create tokens in the last column.

There are various views on the number of bytes that should be used for a token. Personally, I often see suggestions of 24 – 32 bytes.

A token can be created like this.
// Create a 24-byte token with either $token = bin2hex(random_bytes(24)); $token = base64_encode(random_bytes(24)); // Create a 32-byte token with either $token = bin2hex(random_bytes(32)); $token = base64_encode(random_bytes(32));

Now, PHP has a function called uniqid().

As you can see in the official documentation , there are points to be caution about.

Caution
This function does not generate cryptographically secure values, and should not be used for cryptographic purposes.

So uniqid() is completely unsuitable for cryptography.

You might come up with several schemes to apply to uniqid().
var_dump( uniqid() ); echo "\n"; var_dump( uniqid('', true) ); echo "\n"; var_dump( uniqid(mt_rand(), true) ); echo "\n"; var_dump( sha1(uniqid(mt_rand(), true)) );

All of these methods are combinations that are cryptographically insecure. So at the end of the day, you still can’t get a cryptographically secure value with uniqid().

So no matter how you try to manipulate uniqid(), it’s not suitable at all for a purpose like tokens. Don’t use uniqid() to create tokens.

Of course, there is a great machine cost to creating a cryptographically secure token. There is also a heavy program cost due to the complex logic involved. So you have to carefully consider the balance between security and convenience when creating a token.
For example:
$token = bin2hex(random_bytes(32));
If the token you get with this one line of code is cryptographically secure enough for you, there’s no special reason to reject this code.

As we’ve seen, tokens are used for several functions by Web services. And, as described above, they can be created by combining random_bytes() and [bin2hex|base64_encode](). I think this method is suitable for our purposes at this time.

random_bytes() is a function available in PHP 7.0 and later versions. If you’re using PHP 5.2 or a later version before 7.0, there are userland implementations. So I don’t think the hurdle for using a randomization function in PHP is that high.

Now, whether to use bin2hex() or base64_encode() may be a difficult question for some programmers, but it also includes the question of preference. So it’s fine to decide whether to use one or the other based on your personal preference.

Compared with bin2hex(), base64_encode() produces a –slightly- shorter string. On the other hand, the string produced by bin2hex() is composed using just 16 characters [0-9a-f], making it easier to handle. So it up to you which trait is more important.

As I wrote above,

The user completes member registration using a web browser. (At that time, the email address is unconfirmed.)
Email containing a URL (or token) for account activation is sent to the entered email address. (At that time, the token and email address/user ID are linked.)
The user activates his or her account by using the sent URL or token. (At that time, the user’s email address is confirmed.)

When the token is created with the flow above, the activation takes place by comparing the token that comes from the UI form or the URL with the token in the DB. The program asks, “Does this token exist in the DB?” In short, you can confirm with the following query:

SELECT * FROM activation_table WHERE token='obtained_token';

Suppose the token as described above is used to prevent CSRF, for example, when

$_SESSION and post request are both obtained
Both strings are compared

For example, assume when a URL is obtained, your code is written like this:
$token = $_GET['token']; if ($_SESSION['token'] === $token) { // tokens match } else { // tokens do not match }

The code, unfortunately, is vulnerable to an attack called “timing attack.”

In PHP 5.6 and later, you can use a function called hash_equals() to write code like this:
$token = $_GET['token']; if (hash_equals($_SESSION['token'], $token)) { // tokens match } else { // tokens do not match }

Note that for hash_equals(), the official documentation states the following: “It is important to provide the user-supplied string as the second parameter, rather than the first.” So keep that in mind.

In the next column, we’ll learn about hash functions and ways to save and compare passwords.

PREVIOUS ARTICLE NEXT ARTICLE