perl or bash question ["convert strings in a txt to html links"]

Dave Cross davorg at gmail.com
Mon Mar 1 15:13:52 UTC 2010


On 27 February 2010 17:12, Vadkan Jozsef <jozsi.avadkan at gmail.com> wrote:
> How can I do that in bash or perl, that I have a txt file, e.g.:
>
> $cat file.txt
> Hi, this is the content of the txt file, that contains links like this:
> http://www.somewhere.it/, and it could contain: http://somewhere.com,
> etc..
> This is the second line, that doesn't contains links..
> ..
> This is the XYZ line, that contains a link: http://www.somewhere.net
> $
>
>
> ...ok.. so how could I make a regexp for this?
>
> Turning:
>
> http://website.org
> http://www.website.org
>
> to this:
>
> <a href=http://website.org>http://website.org</a>
> <a href=http://www.website.org>http://www.website.org</a>
>
> The solution would be:
>
> sed 'SOMEMAGIC' file.txt > file.html
> or
> perl 'SOMEBIGMAGIC' file.txt > file.html

Parsing URIs using regular expressions (as others have suggested) is
harder than it looks. I recommend using a Perl module like URI::Find
(which is available as an RPM for Fedora - yum install perl-URI-Find).

The code looks like this (lightly adapted from the module's documentation):

#!/usr/bin/perl

use strict;
use warnings;

use URI::Find;

sub replace {
  my ($uri, $orig_uri) = @_;

  return qq(<a href="$uri">$orig_uri</a>);
}

my $finder = URI::Find->new(\&replace);

while (<>) {
  $finder->find(\$_);
  print $_;
}

Put that in a file (called, perhaps, urifind) and make that file
executable. You can then run it like this:

./urifind file.txt > file.html

Hope that helps.

Dave...


More information about the users mailing list