[OT] Hardlinks and directories

Sat Feb 13 04:12:03 UTC 2010

On Saturday 13 February 2010 00:09:38 Suvayu Ali wrote:
> On Friday 12 February 2010 06:06 AM, Marko Vojinovic wrote:
> > Now, hard links are not allowed for directories since they would allow
> > for creation of loops (a directory containing itself), which is a Bad
> > Idea, since it breaks recursion. The filesystem needs to be a *tree* if
> > recursion is to function, so loops are forbidden (how would you delete a
> > directory which contains itself?).
> 
> I don't quite follow you here, don't you mean hardlinking directories
> have the risk of introducing recursion?

No, I am saying that hard linking directories allows creation of loop 
structures. These structures break any recursive algorithm that tries to go 
"downwards" in some directory structure.

Deleting directories is a textbook example. In order to delete a directory, 
you first have to delete all files and subdirectories that it contains, and once 
it is empty, delete the directory itself. So deletion goes on via a recursive 
algorithm:

1) check whether there are files or dirs in target dir
2) delete any files present
3) execute yourself (from step 1) for every subdir as target dir
4) target dir is now empty, unlink it
5) done

Now consider the directory structure of the form where inside dir a/ you have 
dir b/, and inside it dir c/ which is a hard link back to a/ (this is a 
simplest loop situation, much more complicated are possible). IOW, the 
directory a/ contains itself as a subdirectory two levels down. Now execute 
the above algorithm in order to delete a/ --- the algorithm will try to delete 
all subdirectories, namely b/, and execute itself over b/ in step 3. But to 
delete b/, it needs to execute itself over c/ which is actually a/. But to 
delete a/ it needs to execute itself over b/...

As you can see, the algorithm starts to traverse a loop, and goes into an 
infinite recursion --- executes more and more instances of itself without 
control, while never reaching step 4 in any of the instances. This is a Very 
Bad Thing, as you can imagine.

Now, you might think that it is possible to make a more clever deletion 
algorithm which would remember where it has already been and deal with this 
situation. But I can then invent a directory structure where two or three 
loops are interconnected, which would defeat your more clever algorithm. I can 
imagine an arbitrarily complicated graph (think lots of chain-links randomly 
connected to each other in a shape of your favorite Moore sculpture :-)... ). 
It is mathematically impossible to construct an algorithm which would be able 
to traverse an arbitrary graph recursively in finite number of steps.

So the only way out is to forbid hard links to directories in general (with 
the tightly controlled exception of . and ..).

In the end, note that soft links do not suffer from this problem. The soft link 
to a directory is not regarded as a subdirectory, but rather as a file, and 
gets deleted in step 2, without following it (going "inside"). A soft link is 
not a directory itself, so has no contents that would require recursive 
deletion. So creating loops with soft links is completely safe for any 
algorithm that can choose not to follow a soft link. :-)

Best, :-)
Marko