dd question
Cameron Simpson
cs at zip.com.au
Wed Dec 15 01:06:48 UTC 2010
On 10Dec2010 14:28, stan <gryt2 at q.com> wrote:
| On Fri, 10 Dec 2010 03:11:25 +0000 (UTC)
| "Amadeus W.M." <amadeus84 at verizon.net> wrote:
| > I have a binary file with data. Each block of 48 bytes is a record. I
| > want to extract the first 8 bytes within each record. I'm thinking
| > this should be possible with dd, but gawk, perl - anything goes. It
| > just has to be fast, because the data files are ~ 1Gb.
| >
| > I can do this in C++ but I was just wondering if it can be done with
| > existing well tested tools.
|
| The binary aspect makes it tricky. If they were EOL delimited records,
| lots of tools could do this.
|
| Here's a python function, not checked though. It does require that you
| have enough memory to slurp the file into memory. Put it in a file,
| edit for the filenames, and run it as python <filename>. I guess it
| should take less than a minute, but not sure, should be fine for one
| off.
|
| def extract (filename1 = None, filename2 = None):
| if filename1 != None and filename2 != None:
I'd not bother with this check - it is a special purpose function that
will not be misused, and if is _is_ misused it will fail silently, which
is not good.
| infile = open (filename1, "rb")
| slurp = infile.read () # at least as much memory as the file size
| infile.close ()
| outfile = open (filename2, "wb")
| while len (slurp) > 0:
| record = slurp [:48] # extract a record
| first8 = record [:8] # slice off first 8 positions
| outfile.write (first8) # write them out, no separator
| slurp = slurp [48:] # chop them off the file
This step is Very Expensive. Don't reallocate a 1GB string every 48
bytes, just pull out the pieces you need.
| outfile.close ()
|
| extract (filename1 = "your input filename with path",
| filename2 = "your output filename with path")
Untested example:
def get8of48(fp):
while True:
chunk = fp.read(48)
if len(chunk) == 0:
break
yield chunk[:8]
if (len(chunk) != 48:
print >>sys.stderr, "warning: short read from %s (%d bytes)" % (fp, len(chunk))
for chunk8 in get8of48(open("your filename here", "rb")):
... do something with chunk8, the 8-byte chunk ...
Shorter and faster and using less memory.
Cheers,
--
Cameron Simpson <cs at zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/
The general consensus on covered [litter] boxes is that they are a Good
Thing, and having bought one myself now and tried it out for a couple of
weeks, I agree. No more litter sprayed halfway across the city.
- krw at prg.ox.ac.uk (Kenneth Wood)
More information about the users
mailing list