SHAbr - CPU SHA1 Hash Cracker

"A plaintext first consists of an array of characters (unsigned char[16]), then I pass it as an __m128i type to the SHA-1 function. With SSE2 I do 4 plaintexts at the same time, so actually 4 of these variables are passed to this function.

The variables get split into 4 32-bit integers (I called these types UINT4). The SHA-1 function then puts these variables in an array of 80 UINT4's, this array is called W. This array is used to add a value in every one of the 80 steps of the actual hashing part. The first 16 UINT4's in this array W are just the parts of the plaintext, and as I only support plaintexts with length < 16, only the first 4 UINT4's are filled. Then W[4]...W[14] contain zero's in my case, W[15] contains the length of the plaintext in bits. So what happens with W[16]...W[79]? The previous values (the actual plaintext) are 'expanded' into these values, so every step has additional input and is dependent on the plaintext. This expanding isn't that special: W[t] = W[t-3] XOR W[t-8] XOR W[t-14] XOR W[t-16] And then ROTATE this result by one. As W[4]...W[14] contains nothing but zero's, I didn't actually want to set them. But because W[18]...W[30] depend on these zero's, they should be set. Unless we change the EXPAND function for W[18]...W[30]. But as that needs more unrolling of the loops, my code gets bigger and maybe slower (had that before). So now I unrolled certain parts in a strange way, but somehow it works :) I now have some strange code like: for(t = 20; t < 21; t++){ SSE_EXPAND_3(t); ROTATE(t) SSE_EXPAND_3(t+1); ROTATE(t+1) SSE_EXPAND_3(t+2); ROTATE(t+2) SSE_EXPAND_3_8(t+3); ROTATE(t+3) SSE_EXPAND_3_8(t+4); ROTATE(t+4) } This code block within the for loop gets executed only once. So you'd say that I could just write out SSE_EXPAND3(20) - (24) and such, but somehow that makes things slower... Anyway, instead of 58.5 Mhashes/s I now get: Length 5 - 55% in 8.39 s (60.29 Mhashes/s)" (Win32) (Source)

