Awk and sort (of text files)
jd1008
jd1008 at gmail.com
Mon Jun 29 18:57:12 UTC 2015
On 06/29/2015 11:48 AM, Bill Oliver wrote:
> On Mon, 29 Jun 2015, jd1008 wrote:
>
>>
>>
>> On 06/29/2015 03:39 AM, Dario Lesca wrote:
>>> Il giorno dom, 28/06/2015 alle 18.38 -0600, jd1008 ha scritto:
>>> > Hi,
>>> > I have text files made of paragraphs of text, separated by
>>> > blank lines.
>>> > > Each "paragraph" is information about a different item.
>>> > > In need to sort these paragraphs based on the first line
>>> > of each paragraph.
>>> > > Need some hints how to accomplish this.
>>> > > Thanx.
>>> An example of your text file can help us to help you.
>>>
>> I described them perfectly.
>> text paragraphs made of a few or several lines.
>>
>> The paragraphs are separated by an empty line.
>>
>>
>
> Try something like this. It's buggy, but what can you expect for 5
> minutes of work.
>
> This takes a text with lines separated by hard breaks, and an empty line
> between paragraphs, and sorts it.
>
> Here are the obvious problems I haven't bothered to debug:
>
> 1) I counts the empty lines as paragraphs, so you get blank space at the
> top.
>
> 2) I'm doing something wrong with asort (see comment).
>
> 3) It looks like I'm sorting twice -- once with asort, and then to
> reindex. There should be a smart way to do this.
>
> Here's the awk code:
>
>
> BEGIN{newparagraph=0; numlines=0; paranum=0;}
>
> {
> #if the line is blank, it's time to start a new paragraph
> if ($0==""){
> paranum++;
> numlines=0;
> }
> #if it's not blank, buffer it
> else {
> numl[paranum]=numlines;
> paragraph[paranum][numlines++] = $0;
> }
>
> }
>
>
> END{
>
> for (i=0;i<=paranum;i++){
> firstline[i] = paragraph[i][0]
>
> }
>
> #for a reason I don't understand, "sorted" has one index more
> than firstline!?
> #I'm probably making some mistake with starting with 0 vs 1,
> but I'm not going to fix it.
> # so, I'll just increment paranum, because I'm lazy
> asort(firstline,sorted);
> paranum++
>
>
>
> #Renumber the indices
> for (i=0;i<=paranum;i++){
>
> found=0;
> newindex[i] = 999;
> for(j=0;((j<=paranum) && (found==0));j++){
> if(sorted[i] == firstline[j]){
> newindex[i]=j;
> found=1;
> }
> }
>
> }
>
> #print it out
> for(i=0;i<=paranum;i++){
>
> current_paragraph = newindex[i];
> new_numlines = numl[current_paragraph];
>
> for (j=0;j<=new_numlines;j++){
> print (paragraph[newindex[i]][j]);
> }
> print("");
> }
>
> }
>
Here is the simplest solution and it does what I want without resorting
to awk:
for i in `/bin/ls -1 lists*`; do
sed '/./{H;d;};x;s/\n/={NL}=/g' $i | sort | sed
'1s/={NL}=//;s/={NL}=/\n/g' > $i.sorted.txt
done
More information about the users
mailing list