On 27 February 2010 17:12, Vadkan Jozsef <jozsi.avadkan(a)gmail.com> wrote:
How can I do that in bash or perl, that I have a txt file, e.g.:
$cat file.txt
Hi, this is the content of the txt file, that contains links like this:
http://www.somewhere.it/, and it could contain:
http://somewhere.com,
etc..
This is the second line, that doesn't contains links..
..
This is the XYZ line, that contains a link:
http://www.somewhere.net
$
...ok.. so how could I make a regexp for this?
Turning:
http://website.org
http://www.website.org
to this:
<a href=http://website.org>http://website.org</a>
<a href=http://www.website.org>http://www.website.org</a>
The solution would be:
sed 'SOMEMAGIC' file.txt > file.html
or
perl 'SOMEBIGMAGIC' file.txt > file.html
Parsing URIs using regular expressions (as others have suggested) is
harder than it looks. I recommend using a Perl module like URI::Find
(which is available as an RPM for Fedora - yum install perl-URI-Find).
The code looks like this (lightly adapted from the module's documentation):
#!/usr/bin/perl
use strict;
use warnings;
use URI::Find;
sub replace {
my ($uri, $orig_uri) = @_;
return qq(<a href="$uri">$orig_uri</a>);
}
my $finder = URI::Find->new(\&replace);
while (<>) {
$finder->find(\$_);
print $_;
}
Put that in a file (called, perhaps, urifind) and make that file
executable. You can then run it like this:
./urifind file.txt > file.html
Hope that helps.
Dave...