From jozsi.avadkan at gmail.com Fri Jun 5 20:19:16 2015 Content-Type: multipart/mixed; boundary="===============5822105957830140476==" MIME-Version: 1.0 From: Arthur Bela To: users at lists.fedoraproject.org Subject: perl or bash question ["convert strings in a txt to html links"] Date: Sat, 27 Feb 2010 18:12:06 +0100 Message-ID: <1267290726.2365.14.camel@ubuntu> --===============5822105957830140476== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable How can I do that in bash or perl, that I have a txt file, e.g.: $cat file.txt Hi, this is the content of the txt file, that contains links like this: http://www.somewhere.it/, and it could contain: http://somewhere.com, etc.. This is the second line, that doesn't contains links.. .. This is the XYZ line, that contains a link: http://www.somewhere.net $ ...ok.. so how could I make a regexp for this? Turning: http://website.org http://www.website.org to this: http://website.org http://www.website.org The solution would be: sed 'SOMEMAGIC' file.txt > file.html or perl 'SOMEBIGMAGIC' file.txt > file.html :D --===============5822105957830140476==-- From duskglow at gmail.com Fri Jun 5 20:19:16 2015 Content-Type: multipart/mixed; boundary="===============1842777200833635332==" MIME-Version: 1.0 From: Russell Miller To: users at lists.fedoraproject.org Subject: Re: perl or bash question ["convert strings in a txt to html links"] Date: Sat, 27 Feb 2010 09:31:02 -0800 Message-ID: <201002270931.02806.duskglow@gmail.com> In-Reply-To: 1267290726.2365.14.camel@ubuntu --===============1842777200833635332== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable On Saturday 27 February 2010 09:12:06 Vadkan Jozsef wrote: > How can I do that in bash or perl, that I have a txt file, e.g.: > = something like sed -ie s/(.*)/\1/ filename Not sure of the EXACT syntax but that should be close. (apologies if this gets sent twice, I sent the first from the wrong address) --Russell --===============1842777200833635332==-- From mail at robertoragusa.it Fri Jun 5 20:19:17 2015 Content-Type: multipart/mixed; boundary="===============8698083186860063713==" MIME-Version: 1.0 From: Roberto Ragusa To: users at lists.fedoraproject.org Subject: Re: perl or bash question ["convert strings in a txt to html links"] Date: Sat, 27 Feb 2010 21:36:46 +0100 Message-ID: <4B89825E.60705@robertoragusa.it> In-Reply-To: 1267290726.2365.14.camel@ubuntu --===============8698083186860063713== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Vadkan Jozsef wrote: > Turning: > = > http://website.org > http://www.website.org > = > to this: > = > http://website.org > http://www.website.org > = sed -e 's_\(\bhttp://\S*\)_\1_g' Not perfect, as some characters should be urlencoded inside the href. -- = Roberto Ragusa mail at robertoragusa.it --===============8698083186860063713==-- From davorg at gmail.com Fri Jun 5 20:19:27 2015 Content-Type: multipart/mixed; boundary="===============5142558867145766565==" MIME-Version: 1.0 From: Dave Cross To: users at lists.fedoraproject.org Subject: Re: perl or bash question ["convert strings in a txt to html links"] Date: Mon, 01 Mar 2010 15:13:52 +0000 Message-ID: In-Reply-To: 1267290726.2365.14.camel@ubuntu --===============5142558867145766565== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable On 27 February 2010 17:12, Vadkan Jozsef wrote: > How can I do that in bash or perl, that I have a txt file, e.g.: > > $cat file.txt > Hi, this is the content of the txt file, that contains links like this: > http://www.somewhere.it/, and it could contain: http://somewhere.com, > etc.. > This is the second line, that doesn't contains links.. > .. > This is the XYZ line, that contains a link: http://www.somewhere.net > $ > > > ...ok.. so how could I make a regexp for this? > > Turning: > > http://website.org > http://www.website.org > > to this: > > http://website.org > http://www.website.org > > The solution would be: > > sed 'SOMEMAGIC' file.txt > file.html > or > perl 'SOMEBIGMAGIC' file.txt > file.html Parsing URIs using regular expressions (as others have suggested) is harder than it looks. I recommend using a Perl module like URI::Find (which is available as an RPM for Fedora - yum install perl-URI-Find). The code looks like this (lightly adapted from the module's documentation): #!/usr/bin/perl use strict; use warnings; use URI::Find; sub replace { my ($uri, $orig_uri) =3D @_; return qq($orig_uri); } my $finder =3D URI::Find->new(\&replace); while (<>) { $finder->find(\$_); print $_; } Put that in a file (called, perhaps, urifind) and make that file executable. You can then run it like this: ./urifind file.txt > file.html Hope that helps. Dave... --===============5142558867145766565==--