Heise security just covered an interesting paper. The authors claim, that recovery of overwritten data from a hard disc is not possible, and so overwring it several times, as many people do today, is not needed. Well, I read
the paper today (
short version), agree with the first assumption, but consluding that overwriting data several times is not needed might be a bit dangerous. At least for some cases.
Recovering (complete) data is one aspect. Let's assume some kind of authority wants to prove, that we owned a certain file. Maybe the file contains illegal stuff, secret documents or one we claim to have "never received",... we can think of many cases where the document is known, but the interesting question is: did we ever save it to our disc? If yes we might be guilty of whatever the accuses are.
Now an interesting detail in the paper is that, in a realistic case, we have a chance of 0.9% to recover a byte (8 bits) of data correctly, after one additional write over the old data (looking at single bits the situation looks even "better" for us: 87% or more). This, of course, is way too low to recover a document, but might this be used to prove we owned a known one?
I say: it is possible!
Let's assume the data, after overwriting, would not be recoverable. We would have to guess each byte, leaving a chance of 1/2^8 = 1/256 = 0.39% of guessing the correct content. But, as we see: 0.39% < 0.9%! That's where we attack.
In short: if we search for parts of the document in the recovered data (single characters or even bits!) we will be able to identify them. With a lot of garbage in between. To prove the document existed on this disc we have to show, that we have significantly more matches than expected when the recovered data was just "random", aka "not matching" noise.
An example makes this clearer: let's say the text we are searching for was
"abc1234567890abc" and we recover
"leon29lkdowl2nbd".
Note the matches of "2" and "b" at exactly the expected position!
I'm sure this mechanism works iff the documents we are searching for are large enough. I have not yet calculated how large they have to be to reach a given level of certainness with a given recovery probability per byte (>0.39%), but I'm sure it will work. Especially as documents tend to be quite large these days, contain not only text, and those 0.9% are a relatively bad case, there are other (but more unrealistic) ones...
Gimme some days to find the time to do the correct math.