dd question

Wed Dec 15 16:12:07 UTC 2010

Thanks for taking the time to critique and modify

On Wed, 15 Dec 2010 12:06:48 +1100
Cameron Simpson <cs at zip.com.au> wrote:

> On 10Dec2010 14:28, stan <gryt2 at q.com> wrote:
> | 
> | def extract (filename1 = None, filename2 = None):
> |   if filename1 != None and filename2 != None:
> 
> I'd not bother with this check - it is a special purpose function that
> will not be misused, and if is _is_ misused it will fail silently,
> which is not good.

If it was for just me, neither would I.  But the OP is obviously
unfamiliar with python (or perl or ruby) or he would be doing the job
using one of them without asking.  That said, I didn't put in the
else:
  print ("You forgot to enter the filenames")

> 
> |     infile = open (filename1, "rb")
> |     slurp = infile.read ()  # at least as much memory as the file
> size |     infile.close ()
> |     outfile = open (filename2, "wb")
> |     while len (slurp) > 0:
> |       record = slurp [:48]  # extract a record
> |       first8 = record [:8]  # slice off first 8 positions
> |       outfile.write (first8)  # write them out, no separator
> |       slurp = slurp [48:]  # chop them off the file
> 
> This step is Very Expensive. Don't reallocate a 1GB string every 48
> bytes, just pull out the pieces you need.

I agree.  I would have noticed the slowdown and cancelled and done it
right.  Unfortunately he wouldn't have. :-(

> 
> |     outfile.close ()
> |     
> | extract (filename1 = "your input filename with path", 
> |          filename2 = "your output filename with path")
> 
> Untested example:
> 
>   def get8of48(fp):
>     while True:
>       chunk = fp.read(48)
>       if len(chunk) == 0:
>         break
>       yield chunk[:8]
>       if (len(chunk) != 48:
>         print >>sys.stderr, "warning: short read from %s (%d bytes)"
> % (fp, len(chunk))
> 
>   for chunk8 in get8of48(open("your filename here", "rb")):
>     ... do something with chunk8, the 8-byte chunk ...
> 
> Shorter and faster and using less memory.

I would have changed my version to just step through the slurp in
memory.  

low = 0
high = 8

first8 = slurp [low:high]
low += 48
high += 48

etc.

You are obviously a higher class python coder than I am.  I treat
python like an elegant interpreted C.  You treat it like the evolving
functional ??? it is becoming.