Basics: Secure password hashing with salts

Passwörter in Plaintext speichernAnyone who develops software and especially if he/she does so in the web environment, has certainly already written one or the other login system or at least had points of contact in this area. Besides the logic of a secure login or user management system, the secure storage of passwords is one of the most important points during implementation.

Even if the actual login code is 100 percent error-free and secure (which should never be assumed in practice), security vulnerabilities in the server software can still lead to intrusions or hacks. There is always a variable that is out of one’s control and thus websites are hacked, compromised and complete databases with usernames and passwords are read every day.

In order to protect users in the best possible way in the event of such a hack, passwords should be protected by salts (=salt; a methodology from cryptology).

In practice, however, it unfortunately looks like many developers are aware that they should salt passwords, but the opinions and approaches about the “how”, i.e. the implementation, differ greatly and are sometimes so wrong that the well-intentioned “protection” tends to have a negative effect.

Therefore, I would like to clear up the topic of password hashing and salting and cover the whole range of topics from “What is password hashing” to “Best practices for password hashing”.

What is password hashing?

Password hashing means applying a hash function to a password. A hash function is a so-called scatter function, which makes it possible to project large input sets onto (usually smaller) target sets. Often the target sets also have a fixed length.

For the Secure Hash Algorithm (SHA for short), for example, the length of the target set is always 160 bits, which in turn is often written as a 40-digit hexadecimal number.

So if you have the password “Secret” and apply SHA1 to it, the output is as follows:

sha1('Secret') == f4e7a8740db0b7a0bfd8e63077261475f61fc2a6

If, on the other hand, the password is “SuperLongAndVerySecretPasswordWithManyNumbers6489451456498489494894894”, this results in the following hash (using SHA1):

sha1('SuperLongAndVerySecretPasswordWithManyNumbers6489451456498489494894894') == 08e114b6377129fa37fed8f86af2e06bf62fad71

MD5 ist unsicherDespite the much longer input, the target set still has the same length of 40 characters. This results in a problem of hash functions. Since the input set can be larger than the target set, it can happen that two different inputs produce the same hash value. This is called a collision.

If the hash is to be stored in the database instead of the password, collisions are an undesirable side effect. In addition, not every hash function is suitable for application to passwords. For this reason, there is a subtype of hash functions called cryptological hash functions.

This type of hash function is characterized by a number of features. For example, a cryptological hash function is by definition collision-resistant. Collisions are still theoretically possible, but they involve an insurmountable amount of work, so that practically no collision can be found. Further, even the smallest changes to the input set should cause a seemingly arbitrary change to the target set. Furthermore, it shall not be possible to reconstruct the input from a hash of a cryptographic hash function.

If collisions are nevertheless found for cryptological hash functions, then either a weakness has been discovered in the algorithm or the effort required to find a collision is no longer great enough in relation to the technical progress to ensure collision resistance. In this case, the respective algorithm is said to be “broken”. The use of such algorithms should be avoided. (This also applies to the MD5 function, for example.) Thus, only cryptological hash functions should be used for password hashing.

The standard procedure for managing user data and passwords using hash functions is as follows:

  1. The user registers with username and password.
  2. The password is hashed before it is saved in the database and only the hash is stored in the database.
  3. If the user now wants to log in, he enters his username and password. His password is hashed again and compared with the hash stored in the database for his username.
  4. If the hashes match, the user is logged in, otherwise he receives an error message.

The error message should only ever say that the credential data is incorrect. If it is stated whether the password or user name is incorrect, the attacker can already conclude which one of the two is correct in any case. Therefore, only generic error messages should be issued at this point.

The above four steps, i.e. hashing the passwords, ensure that only the hashes are made public if the database is lost or broken into. However, the attacker cannot log in using the hashes, since they would be hashed again in step 3 and would therefore no longer match the hash of the actual password.

Why are hashes not secure per se?

After the last paragraph, one might think that (cryptographic) hashes are secure per se. One might mistakenly assume that, using the four steps from the previous paragraph, the passwords are sufficiently secure and that one has done everything necessary to protect them. Unfortunately, this is not the case!

Was ist Brute Force?While we have correctly stated that a hacker cannot (easily) log in using the hash, there are still ways to infer the password from the hash. And since users (unfortunately) usually use the same credentials for multiple portals/websites, recovering the password is in the hacker’s interest. Therefore, we will now look at how hashes can be “decrypted” (or cracked), and then in a later paragraph we will look at how we can further mitigate this risk.

In principle, we can distinguish between two methods of cracking hashes:

  1. Brute Forcing; Brute forcing is an attempt to obtain the desired result by trying out all possible combinations. If you have the hash “e22a63fb76874c99488435f26b117e37”, for example, you systematically form the corresponding hash value for all combinations of an alphabet (e.g. A-Za-z0-9) and compare this with the existing hash. If the hashes are equal, one looks which was the last input combination and thus knows the password. Since trying all possibilities means a high computation and time expenditure, one can try to work first by means of so-called “Dictionaries”. A dictionary is a list of common passwords. The hashes are then generated for this list and compared with the searched hash. The larger the hacked database, the more likely it is that a user has used one of the common passwords from the dictionary.
  2. Lookup Tables; Lookup tables are used to circumvent the performance problem of brute force attacks. A lookup table is a special data structure that contains precalculated password hash pairs. The special feature of lookup tables is that they can still process several hundred queries per second, even with data sets of several million pairs. Lookup tables are therefore faster than pure brute force, but they are limited by the precalculated values to a predetermined input set (of passwords), just like dictionary attacks. A special form are the so-called “Rainbow Tables”. They are a hybrid of lookup table and brute force attacks. Thus, a part of the hashes is precalculated to determine whether the hash of a password is in a subset. If this is the case, the subset is hashed by brute force to find the matching password. In this way, more hashes can be stored in less space, thus increasing the hit rate.

Now let’s get to why we should salt passwords before hashing them.

Use of salts in password hashing

Starke Passwort SaltsA salt can be used to override the lookup table procedure described in the previous section. This works because lookup tables assume that the same password always results in the same hash. So if two users use the same password, they usually have the same hash (if only hashed).

However, if the passwords of the individual users are given a random salt, the two passwords result in different hashes, so that a lookup table cannot work, because the salts are not known before the hack and therefore no lookup table can be precalculated.

Brute force attacks are immune to this. However, for sufficiently long passwords, the effort of brute forcing is so high that decrypting all captured passwords cannot be done in an acceptable time. In addition, key stretching can make brute forcing even more difficult. (More about key stretching in the next section).

Best practice: password hashing with salts

When using salts, the following things should be considered for maximum effectiveness.

  • The salt used should be unique per user and password. This means that a new salt should be generated not only when a user registers, but also when he changes his password.
  • The salt should consist of a random string of characters. A cryptographically secure pseudo-random number generator (CSPRNG) should be used to generate the salt. This is important because not every random number generator is truly random. For example, many standard generators show patterns that can be used to infer other numbers (salts) generated by the generator when large numbers of random numbers (salts in this case) are involved. In the following list (click the title to expand), you will find common CSPRNG implementations for different programming languages.
  • List of CSPRNG algorithms & libraries
    .NET (C#, VB, F#): System.Security.Cryptography.RNGCryptoServiceProvider
    C/C++ (Windows API): CryptGenRandom, ISAAC
    GNU/Linux/Unix-based languages: Take /dev/random or /dev/urandom as Stream-Source
    Java: java.security.SecureRandom, Java TRNG Client
    JavaScript: RandomSource.getRandomValues(), CryptoJS
    Lua: LuaCrypto, ISAAC
    NodeJS: crypto.randomBytes()
    Perl: Math::Random::Secure
    PHP: mcryptcreateiv, opensslrandompseudo_bytes, php-csprng
    Python: os.urandom, pyaes / Advanced Encryption Suite
    Ruby: SecureRandom, drbg-rb
    Visual Basic: ISAAC
  • Furthermore, the salt should be at least as long as the result of the hash function used. If the salt is too short, there are too few variations, so that the attacker could precompute his lookup tables for all salt variations in a reasonable amount of time despite using a salt.
  • The hashing itself should be artificially slowed down by increasing the computational effort. Salts can be used to ensure that lookup tables become useless. However, salts do not help against pure brute force attacks. To make brute forcing as ineffective as possible, a hashing algorithm that supports key stretching should be used. Key stretching algorithms are particularly computationally intensive, so the hash rate can be slowed down by a freely defined factor. (Usually this is done by specifying the iterations when calling the hashing function). If a cracking software creates e.g. 300,000 hashes/second with a normal procedure, the rate can be reduced by appropriate key stretching e.g. to 5 hashes/second. In this way the brute forcing becomes ineffective. Common key stretching libraries are crypt(5), PBKDF2 or bcrypt. The key stretching parameter should be chosen in such a way that the hashing is as slow as possible, but weak devices or web servers do not suffer. If too many iterations are used, the performance of the web server on which the hashing operations are performed may suffer if the number of users is high, for example.

So let’s summarize again in a nutshell:

  1. The salt should be unique per user and password
  2. The salt should originate from a cryptographically secure random number generator
  3. The salt should be at least as long as the hash
  4. A key stretching procedure should be used

If you follow the rules above, not much can go wrong. Nevertheless, in the next section we’ll take another look at what you definitely shouldn’t do when hashing and salting.

How not to hash and salt passwords

Attention – the following things should explicitly not be done when hashing and salting passwords!

  • Use salts multiple times; If the salt is generated only once and/or is hard-coded in the source code, then it effectively loses its usefulness. With a hard-coded salt, the same password of two users would also result in the same hash. This means that the lookup table would only have to be recalculated once for the defined salt and the protection provided by the salt would be nullified. Therefore, a salt should always be generated once per user and password.
  • Take usernames as salts; Using the username as Salt is an equally bad idea. Often, users use the same username for different services. If all services were to use the username as a salt, a cracked password could be used for all affected services. Therefore, a cryptographically secure, random salt should be generated.
  • Use a salt that is too short; If the salt is too short, attackers can precompute lookup tables for all possible salt variations. For example, if the salt consists of only 2 ASCII characters, there are only 95*95=9025 possible salts. Even if one assumes a large password list of 20MB for a dictionary attack, this results (when calculating the list with all salts) in just 176GB of data, the processing and storage of which is no longer a problem by today’s standards.
  • Exclusively client-side hashing; Although it seems to make sense at first glance to hash the password on the client side, for example in the browser using JavaScript, so that the password does not have to be transmitted to the server in “plain text”, a closer look reveals several risks. On the one hand, it cannot always be guaranteed that the user’s browser supports JavaScript, and on the other hand, this would mean that only the calculated hash would be sent to the server and compared there with the hash in the database. This would also mean that an attacker would no longer have to decrypt the hash when capturing it, but could use it directly to log in, which is a major vulnerability. Client-side hashing and salting can therefore at most be an additional service to server-side hashing. It should also be remembered that client-side hashing is no substitute for a secure connection (e.g. HTTPS TLS/SSL) between client and server. Anyone who only hashes on the client side but does not use a secure transport route runs the risk of the hash being tapped by means of a man-in-the-middle attack and then misused.
  • Use insecure or self-written hashing algorithms; On some websites/forums it is suggested to combine several hash functions or to call one and the same function in a nested way to “improve security”. Such suggestions can for example look like: sha1(sha1(password)) or md5(sha1(password)). What is often forgotten here is the estimation of benefits and costs. On the benefit side, there is ideally the minimally higher computational effort. However, this approach can be implemented much more effectively with an iterative algorithm (keyword: key stretching). On the cost side, there is the risk of weakening security or causing incompatibility problems, so such methods should be avoided. It is also not a good idea to try to design your own hash algorithm. Errors often occur here, which in turn call complete security into question. Established algorithms, on the other hand, have been tested umpteen times and are much more stable, so you should fall back on an already established algorithm.

Further security measures

Hacker sind überallUsing the methods learned so far in this article – hashing, salting, and key stretching – a well-secured system can already be set up. However, there are other ways to increase the protection of passwords. One of these possibilities is to encrypt the hashes.

Hash encryption

Encrypting hashes can ensure that brute force attacks become useless. During a hack, the hacker receives the hashes in encrypted form. If he now generates hashes in order to compare them with the captured (supposed) hashes, he will not find a match because he would still have to encrypt the hashes he generated using the same key as the application.

Solutions such as AES or HMAC are suitable for encrypting hashes. For this system to work, however, the key (secret) should be stored on a dedicated system or the entire encryption process should take place on this system. If the key/encryption logic is on the same server as the database, it is likely that the attacker will have access to it as well, so that the encryption can be broken. However, if there is no other way or no suitable hardware available, it is still better to implement the encryption logic on the same web server as the database than not to encrypt at all. Nevertheless, it should be considered whether the security requirements do not dictate that additional, dedicated hardware be purchased in such a case.

Furthermore, the key for the hash encryption should be generated dynamically during the installation of an instance. A key that is permanently stored in the software increases the risk of a successful attack. Once the attacker has access to the code, he could attack all instances of this software if the key is fixed. With a dynamically generated key, on the other hand, the attacker would have to have access to every system that he wants to attack.

Time-constant hash comparison

When implementing the hash comparison, i.e. when validating the password, care should be taken to ensure that the comparison always takes the same amount of time. This means that both the validation of a valid password hash and the validation of an invalid hash should take the same amount of time.

The most commonly used method for comparing two hashes is probably as follows (or similar):

string hash1 = "e22a63fb76874c99488435f26b117e37";
string hash2 = "0a4ff18a7d23f8b3ded5eaf93104ac88";
if (hash1 == hash2) {
  return true;
}

However, the “==” comparison operator (in most languages) causes the comparison to stop at the first different digit and to be acknowledged with a “false”. In the above example, this would be directly after the first digit. For identical hashes, however, the “==” operator would go through all digits of the hashes. This results in a time difference that attackers can exploit.

If the attacker attacks a web service and tries all hashes for all 256 byte values at the first position of the hash, he can use the time difference to determine which is the correct byte at the first position. After that, he takes the second position and can thus gradually approach the correct hash or the correct input.

This scenario may sound improbable at first glance, but it is technically and practically feasible, as a paper from Stanford University shows.

To eliminate this attack vector, it is necessary to use a time-constant comparison routine. This could in turn look like the following:

string hash1 = "e22a63fb76874c99488435f26b117e37";
string hash2 = "0a4ff18a7d23f8b3ded5eaf93104ac88";
bool valid = true;
if (hash1.Length == hash2.Length) {
  for (int i = 0; i < hash1.Length; i++) {
    if (hash1[i] != hash2[i]) valid = false;
  }
} else {
  valid = false;
}
return valid;

In the above code, each digit is passed through in any case, regardless of whether the two hashes are equal or not. Thus, the attacker cannot draw any conclusions about the correctness of his hash from the time required.

Conclusion

Even though the article has now become somewhat more detailed, it can be stated that password hashing and salting is not “witchcraft”. If you limit yourself to the essential points, it is possible to implement a secure system without much effort. And just because it is not an unmanageable effort, one should always use the above mentioned techniques like hashing, salting and key-stretching when implementing a login/password based system.

I would appreciate feedback and experiences (as always). How do you implement your hashing system? Do you have experience with key-stretching, salting and the above mentioned approaches? Has your system or one of the systems you maintain ever been affected by a hack?

0 Comments

Leave a comment

Please be polite. We appreciate that. Your email address will not be published and required fields are marked