Towards a Rigorous Statistical Analysis of Empirical Password Datasets

05/29/2021
by   Jeremiah Blocki, et al.
0

In this paper we consider the following problem: given N independent samples from an unknown distribution 𝒫 over passwords pwd_1,pwd_2, … can we generate high confidence upper/lower bounds on the guessing curve λ_G ≐∑_i=1^G p_i where p_i=[pwd_i] and the passwords are ordered such that p_i ≥ p_i+1. Intuitively, λ_G represents the probability that an attacker who knows the distribution 𝒫 can guess a random password pwd ←𝒫 within G guesses. Understanding how λ_G increases with the number of guesses G can help quantify the damage of a password cracking attack and inform password policies. Despite an abundance of large (breached) password datasets upper/lower bounding λ_G remains a challenging problem. We introduce several statistical techniques to derive tighter upper/lower bounds on the guessing curve λ_G which hold with high confidence. We apply our techniques to analyze 9 large password datasets finding that our new lower bounds dramatically improve upon prior work. Our empirical analysis shows that even state-of-the-art password cracking models are significantly less guess efficient than an attacker who knows the distribution. When G is not too large we find that our upper/lower bounds on λ_G are both very close to the empirical distribution which justifies the use of the empirical distribution in settings where G is not too large i.e., G ≪ N closely approximates λ_G. The analysis also highlights regions of the curve where we can, with high confidence, conclude that the empirical distribution significantly overestimates λ_G. Our new statistical techniques yield substantially tighter upper/lower bounds on λ_G though there are still regions of the curve where the best upper/lower bounds diverge significantly.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro