Hash and password preservation and comparison

Teachings of a Samurai Engineer 6: Hash and password preservation/comparison 2/2

Previously, we learned about the hash itself, the first step in preserving and comparing passwords. This time, we use that knowledge to learn about how passwords should be saved.

Previously, we learned about the hash itself, the first step in preserving and comparing passwords. This time, we use that knowledge to learn about how passwords should be saved.

First, there’s something we need to know going forward: What is a password? Let’s go over the basics. Using a password is a type of authentication activity.

Authentication is something that confirms that you’re really who you say you are. This authentication breaks down broadly into three types:

  • Authentication by knowledge (memory): I know / What you know / Something you know
  • Authentication by possessions: I have / What you have / Something you have
  • Biological authentication: I am / What you are / Something you are

According to the author’s distant memory, it was “I-” but searching Google today, it seems like “what you-” or “something you-” is a more common format for remembering it.
Either way, the above are the three major elements. These three all have their own pros and cons, which will be abbreviated here.
Therefore, if you want strong security, the common method is:

— Double authentication

A method which is often chosen for this exact issue. If it’s simply a question of verifying twice, this is called two-step verification, while if you are combining different elements of the above at each step, this is called two-factor authentication.

For example, google has a two-step verification, while Github has two-factor authentication. There are many forms of authentication available. A small site might, considering production costs, use single-step authentication, and most often a knowledge/memory-based one.
Additionally, whether multi-step or multi-factor authentication, there is usually a memory-based part. When it comes to memory-based authentication, it’s usually password-based. Thus it’s natural that we’ve come to think of authentication as meaning passwords first and foremost on many sites. Recently, the term ‘passphrase’ has come into use, where ‘phrase’ implies more than one word. That said, in this text, passphrases, anything with two or more words, long letter strings and so on will all be called ‘password’. Now then. Since we’re comparing whether the saved and inputted passwords match…

The easiest method when only keeping that objective in mind is like so:

— Save the password row directly to the DB.

This is commonly called ‘plaintext storage’, and as an actual method, it’s common wisdom that this should never be done. For one, if an SQL injection or other method causes an information leak, if the password is in plaintext, it can immediately be abused, and so you should never use plaintext storage.
While this isn’t said as much recently, I remember that there used to be some talking about plaintext saves making internal crimes easier as well: If a password is saved as plaintext, someone on the inside can easily view and abuse it. Either way, for the above reasons, plaintext storage is out of the question, so you need some kind of modified storage.

In practice, this kind of password modification is to buy time after it’s cracked. This doesn’t mean that modified passwords are safe if they’re leaked; rather, you might be able to buy some time while taking countermeasures, which is an important difference.

One way is to use a cipher, but then the question becomes how to safely preserve the key to the cipher. This is a bit of a sticking point. Because of this, in practice, the main method right now is the hash.

To start with the answer first, current PHP uses ‘password_hash () ’ integrals.
password_hash () integrals are ‘PHP 5 >= 5.5.0, PHP 7 ’, or post-5.5. If you’re after PHP 5.3.7, there’s a user land implementation available on Github, so you shouldn’t have too much trouble.
Some might still be using 5.3.3. In such situations, I think it’s time to seriously consider updating.
Now, I would like to go back a little and talk about the days before password_hash (), to note some cautionary points before we arrive back at password_hash (). There was a talk from before about what a bad idea plaintext storage is. Thus the solution at the time was to save hash values using MD5.

The trouble is…

  • It became clear a while ago (2007 if not 2004) that MD5 can easily be attacked (its brute force resistance can be broken).
  • Since it’s high-speed logic, it’s weak to brute force, rainbow table creation, etc. (more on that later)
  • Since it doesn’t have salt (more on that later), you can make the previously explained rainbow tables.

These three points proved relatively quickly that it’s not really going to work. Now let’s talk about high-speed logic a bit.
Besides saving passwords, as I wrote before, hashes are used for hash table searching, detecting falsification, etc. For these uses, calculating a hash value quickly is generally to your advantage, so many hash algorithms had speed requirements. The problem is that in the case of password preservation, a fast algorithm just means ‘handy for crackers’, so you need to do something about that.

Generally speaking…

  • You use a method called a ‘stretch’ to repeat multiple hashes
  • You use a specialised hash logic with multiple calculations (i. e slower), suited for password preservation

These are the main methods. MD5, as I wrote above, was found to be full of problems a long time ago. Among people who knew about this, SHA-1 was in use until recently… but in 2012, 2015 and more, some methods of attacking it were reported, so nowadays it’s not liked much.

Now I will talk a little about salt. In a hash, the same input obviously gets the same output.

$p = ’password’;
//
echo md5 ($p), "\n";
echo sha1 ($p), "\n";
echo hash (’SHA256 ’, $p), "\n";

Will always get…

<Result>
5 f4 dcc3 b5 aa765 d61 d8327 deb882 cf99
5 baa61 e4 c9 b93 f3 f0682250 b6 cf8331 b7 ee68 fd8
5 e884898 da28047151 d0 e56 f8 dc6292773603 d0 d6 aabbdd62 a11 ef721 d1542 d8
</Result>
back.

Thus if you build up the environment, database, site etc. for a rainbow table once, it’s easy to search from there, creating an undesirable situation where you can’t really buy any time at all. That’s why you start to salt your hash.
This ‘salt’ becomes far less effective if all users on a site have the same one, so it’s become standard to have a different salt for each user.

However, in this case, you need to save the salt value alongside the password.

Thus…

  • Choosing a suitable hash algorithm
    • Picking a slow algorithm if possible
    • Implementing stretch if it’s fast
  • Creating a different salt and saving that value for each user

…All become necessary. Additionally, no algorithm is safe forever. Which means…
— A system to adapt when the hash algorithm changes
becomes necessary.

Thus, password_hash ().

Using this…

  • The default becomes a slow algorithm meant for password hashing
  • Since you can change the cost/calculation amount, you can adapt it to increases in machine power
  • The row returned by password_hash () includes the salt, algorithm type and cost, so by just saving that, you can…
    • See the algorithm and have no problem changing algorithm in the middle
    • See the cost and have no problem changing cost in the middle
    • Since the salt is included, you don’t need to save it separately

It’s all very convenient.
As for the algorithm, it’s currently bcrypt. However, while it’s not on the Japanese site, the English site lists ‘Argon2 i’ and ‘Argon 2 id’, which suggests that password hashing algorithms are occasionally tested, compared and debated.
From all this, you can see that if you’re making a PHP site now, using password_hash () is the easiest and safest method.

Now, a few warnings on the convenient password_hash ().
First of all, about saving the rows returned by password_hash ().
https://www.php.net/manual/ja/function.password-hash.php
Therefore, it is recommended to store the result in a database column that can expand beyond 60 characters (255 characters would be a good choice).
https://www.php.net/manual/en/function.password-hash.php

…As it says on the English site. Current bcrypt character rows are 60 bytes, but if you get about 25560 bytes of saving space, it should help with future proofing.

As for calculating cost, the default is 10, but you can increase it a bit. Increasing it to something like 20 all at once, on the other hand, will get it to slow down far too much. That one might depend on your machine, but going with the author’s, it looks like this.

$p = ’password’;
//
$ts = microtime (true);
password_hash ($p, PASSWORD_DEFAULT, [’cost’ => 16 ]);
$te = microtime (true);
var_dump ($te — $ts);

<Result>
cost 14: float (1.5029780864716)
cost 15: float (2.987370967865)
cost 16: float (6.1744811534882)
cost 17: float (11.909821033478)
cost 18: float (23.895478010178)
</Result>

…So you might want to be careful with cost number adjustments.

Lastly, and this might be the most important part, bcrypt — the default for current password_hash () — has a 72 character limit. As for how excess characters are dealt with, it depends on the process. That said, in PHP, it just gets cut off.Because of this, a row that’s the same for 72 characters and changes after the 73rd will be seen as a match.
Let’s give it a try.

First, confirm typical hash row comparison code.

$p = ’password’;
$p2 = ’password2 ’;
//
$hash_string = password_hash ($p, PASSWORD_DEFAULT);
//
var_dump ["{$p} / {$p}", password_verify ($p, $hash_string) ];
echo "\n";
var_dump ["{$p} / {$p2 }", password_verify ($p2, $hash_string) ];

<Result>
string (19) “password / password”
bool (true)
string (20) “password / password2”
bool (false)
</Result>

By comparing the password_hash () row through password_verify (), we confirmed that a matching password returns true, and a mismatched one returns false.
Next, we check by giving it a password that’s the same for the first 72 characters, and diverges after.


$p = ’abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstaaa’;
$p2 = ’abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstbbbbb’;
//
$hash_string = password_hash ($p, PASSWORD_DEFAULT);
//
var_dump ["{$p} / {$p2 }", password_verify ($p2, $hash_string) ];

<Result>
string (155) “abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstaaa / abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstbbbbb”
bool (true)
</Result>

In addition to changing characters from the 73rd on, we also changed the overall length, but as long as the first 72 match, it returns as true. These limitations are part of bcrypt, the current default. The best counter for now is probably to just add a limit that caps password (or passphrase) length at 72 characters.

If you want alternatives…

  • Add a hash value to any passwords of 73 or more characters: SHA-256 will limit your space a bit. If you use SHA-512 as a binary, since bcrypt is not binary-safe, getting a \0 (NULL character) in there will allow illegal logins, so be careful.
  • Use the Argon2 algorithm: Watch your PHP version (it’s only compatible with PHP 7.2–7.3 and on), and in its own words ‘this algorithm is only available if PHP has been compiled with Argon2 support’, so be careful with how you install it.

These are some of your options, but… personally I think ‘surely just asking people to keep it to 72 character passwords will do, right?’ Even with such limitations, it’s both more secure and easier than other methods, so I suggest using password_hash () proactively where possible.

That went on for a bit longer than I meant to, so let’s keep it at that for now.
Next time, we’ll go a bit deeper into cookies and sessions.

Part 1

Part 2

Part 3

Part 4

Part 5

Part 7

Part 8

Part 9

Michiaki Furusho

PREVIOUS ARTICLE NEXT ARTICLE