quasi-[OT] Adobe Flash

Sat Oct 23 17:10:23 UTC 2010

On Saturday, October 23, 2010 14:44:25 Patrick O'Callaghan wrote:
> On Sat, 2010-10-23 at 12:27 +0100, Marko Vojinovic wrote:
> > On Saturday, October 23, 2010 04:27:45 Patrick O'Callaghan wrote:
> > > On Thu, 2010-10-21 at 21:33 -0600, Petrus de Calguarium wrote:
> > > > Fortunately, Suvayu's brilliant script gets around that and manages
> > > > to access the file, even though it is already deleted, while
> > > > Patrick's suggestion of hard linking to it does not work, because it
> > > > is already deleted, unless he also has some ingenious trick up his
> > > > sleeve to "get a handle on" the deleted file.
> > > 
> > > Yes, it's a neat trick. However the 'cp' will terminate when it reaches
> > > the end of the input file, even if it's still being written to by the
> > > flash process. That would explain why the output is sometimes
> > > truncated. Getting round that would need a copy process that waits to
> > > see if there's more output, either by polling or by using inotify. IOW
> > > something conceptually similar to "tail -f".
> > 
> > Just to follow that idea, would something like
> > 
> > tail -f /proc/<pid>/fd/<file_id>   > /tmp/flashfile.flv
> > 
> > work? (Maybe with a couple more switches to tail, to start from the
> > beginning of the file, etc...)
> 
> I suspect that 'tail' is designed for text files (it has options for how
> many lines to output etc.) 'man tail' is not very clear on whether it
> can work for binary files, e.g. what happens when it gets a null byte in
> the stream? Some experimentation is in order.
> 
> Also, 'tail -f' will sit forever waiting for input, even if nothing is
> writing to the file. The present case is slightly different in that we
> can assume (until Adobe changes it again ...) that a single process is
> writing to the Flash buffer file, hence the idea of using inotify to
> notice when the writer has gone. However on second thoughts that may not
> be necessary. Given that the /proc file will disappear when the writer
> dies, it would be enough to loop until getting an error (EIO?, not
> sure).

Well, without wasting much time on this, I tried to copy a random .jpg file I 
had lying around, using the following:

tail -q --bytes=1G file1.jpg  >  file2.jpg

That produced file2.jpg which was exactly the same as file1.jpg. This suggests 
that tail would work correctly for binary files (or at least this one that I 
tried :-) ).

Looking at man tail, I found the following to be useful for this particular 
purpose:

 --bytes  enables us to specify the initial number of bytes that are to be 
read (starting from the bottom of the file). Like in my example above, a big 
enough value (1G) would ensure the file is being read from the very beginning.

 -q disables any headings and stuff that might corrupt a binary file.

 -f would keep reading the file and appending the output until tail dies.

 --pid would enable tail to monitor the process that writes to the file and 
terminate itself automatically when the write is complete.

Granted, I have no idea what happens if tail receives a null byte, but I guess 
it should ignore it, since (with -f) it will keep watching if the file gets 
appended subsequently. So it should not terminate after a null, by design. One 
should examine the source code of tail or experiment with various binary 
inputs to determine exact behavior, but I have a feeling it should work.

So, given the <pid> and <file_id> information, hopefully this should Just Work:

tail -f -q --bytes=1G --pid=<pid>  /proc/<pid>/fd/<file_id>   > /tmp/flashfile.flv

Haven't tried it, though :-) . I guess the one gigabyte size value is bigger 
than any flash file one can find to download from Internet, so it should be ok.

As for the <pid>, my guess is that the process that opens the file for writing 
is the only one allowed to actually write to it, since otherwise one can get a 
race condition and data might get corrupted. Once that process dies, tail will 
die along with it, leaving a clean /tmp/flashfile.flv as a result. At least that 
is my theory. ;-)

Now, all that is needed is that someone write a script and try it out. I am 
not very versatile with extracting <pid> and <file_id> and such stuff, but 
otherwise the script should be trivial. :-)

Best, :-)
Marko