I have been using a shell script called save-page-as.sh to download a complete web page. This has been working as expected. The relevant line the the script is: "${browser}" "-new-window" "${url}" &>/dev/null
I now need the ability to run this program or another program via email to my computer from other locations. I do not have the option to login remotely.
The save-page-as.sh program runs firefox. I have not been able to get this to work using email. env shows DISPLAY=:0.0. I have added each of the Display commands as below:
export DISPLAY:0 export DISPLAY:0.0 export DISPLAY:0.1
None of those have worked.
the url I am trying to download does not have an extension ie. no '.htm' such as: https://my.acbl.org/club-results/details/338288
wget does not download the correct web page.
Appreciate any pointers to get the save-page-as.sh working using a browser or a different command line program.
David
On 7/3/21 8:02 PM, dwoody5654@gmail.com wrote:
I have been using a shell script called save-page-as.sh to download a complete web page. This has been working as expected. The relevant line the the script is: "${browser}" "-new-window" "${url}" &>/dev/null
I now need the ability to run this program or another program via email to my computer from other locations. I do not have the option to login remotely.
The save-page-as.sh program runs firefox. I have not been able to get this to work using email. env shows DISPLAY=:0.0. I have added each of the Display commands as below:
export DISPLAY:0 export DISPLAY:0.0 export DISPLAY:0.1
None of those have worked.
the url I am trying to download does not have an extension ie. no '.htm' such as: https://my.acbl.org/club-results/details/338288
wget does not download the correct web page.
Appreciate any pointers to get the save-page-as.sh working using a browser or a different command line program.
David
Hi David,
Try this
$ curl https://my.acbl.org/club-results/details/338288 --output eraseme.html
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 463k 0 463k 0 0 193k 0 --:--:-- 0:00:02 --:--:-- 193k
I opened eraseme.html and the 338288 web page right next to each other in Firefox and they look exactly the same to me.
I use curl almost exclusively for download web site. wget has its issues.
HTH, -T
On Sat, 3 Jul 2021 20:25:04 -0700 users@lists.fedoraproject.org wrote:
On 7/3/21 8:02 PM, dwoody5654@gmail.com wrote:
I have been using a shell script called save-page-as.sh to download a complete web page. This has been working as expected. The relevant line the the script is: "${browser}" "-new-window" "${url}" &>/dev/null
I now need the ability to run this program or another program via email to my computer from other locations. I do not have the option to login remotely.
The save-page-as.sh program runs firefox. I have not been able to get this to work using email. env shows DISPLAY=:0.0. I have added each of the Display commands as below:
export DISPLAY:0 export DISPLAY:0.0 export DISPLAY:0.1
None of those have worked.
the url I am trying to download does not have an extension ie. no '.htm' such as: https://my.acbl.org/club-results/details/338288
wget does not download the correct web page.
Appreciate any pointers to get the save-page-as.sh working using a browser or a different command line program.
David
Hi David,
Try this
$ curl https://my.acbl.org/club-results/details/338288 --output>
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 463k 0 463k 0 0 193k 0 --:--:-- 0:00:02 --:--:-- 193k
I opened eraseme.html and the 338288 web page right next to each other in Firefox and they look exactly the same to me.
There are spacing and alignment differences and apparently other differences. Also if you then run:
html2txt eraseme.html
or
html2text eraseme.html
it does not display any of the text(content).
I use curl almost exclusively for download web site. wget has its issues.
HTH, -T _______________________________________________ users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
On 7/3/21 9:02 PM, dwoody5654@gmail.com wrote:
wget does not download the correct web page.
When you try wget, what happens?
Appreciate any pointers to get the save-page-as.sh working using a browser or a different command line program.
If wget isn't working, try curl.
On Sat, 3 Jul 2021 21:25:37 -0600 joe@zeff.us wrote:
On 7/3/21 9:02 PM, dwoody5654@gmail.com wrote:
wget does not download the correct web page.
When you try wget, what happens?
Appreciate any pointers to get the save-page-as.sh working using a browser or a different command line program.
If wget isn't working, try curl.
It did not do any better than wget. I did not find any options in curl to convert links to local.
David
users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
On Sun, Jul 4, 2021 at 9:02 AM D&R dwoody5654@gmail.com wrote:
It did not do any better than wget. I did not find any options in curl to convert links to local.
The wget man page suggests: wget -E -H -k -K -p https://my.acbl.org/club-results/details/338288
https://www.gnu.org/software/wget/manual/wget.html#Recursive-Retrieval-Optio...
On 2021-07-04 9:01 a.m., D&R wrote:
On Sat, 3 Jul 2021 21:25:37 -0600 joe@zeff.us wrote:
On 7/3/21 9:02 PM, dwoody5654@gmail.com wrote:
wget does not download the correct web page.
When you try wget, what happens?
Appreciate any pointers to get the save-page-as.sh working using a browser or a different command line program.
If wget isn't working, try curl.
It did not do any better than wget. I did not find any options in curl to convert links to local.
Maybe you need to explain better what you're trying to do. Are you trying to get the html file or are you trying to download everything so that you can view it offline?
On Sun, 4 Jul 2021 10:37:33 -0700 samuel@sieb.net wrote:
On 2021-07-04 9:01 a.m., D&R wrote:
On Sat, 3 Jul 2021 21:25:37 -0600 joe@zeff.us wrote:
On 7/3/21 9:02 PM, dwoody5654@gmail.com wrote:
wget does not download the correct web page.
When you try wget, what happens?
Appreciate any pointers to get the save-page-as.sh working using a browser or a different command line program.
If wget isn't working, try curl.
It did not do any better than wget. I did not find any options in curl to convert links to local.
Maybe you need to explain better what you're trying to do. Are you trying to get the html file or are you trying to download everything so that you can view it offline?
Download everything so that what I see offline is exactly the same as online.
users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
On 7/3/21 8:02 PM, dwoody5654@gmail.com wrote:
the url I am trying to download does not have an extension ie. no '.htm' such as: https://my.acbl.org/club-results/details/338288
wget does not download the correct web page.
What happens for you? I tried it with wget and I get the exact same page as Firefox does.
On Sun, 4 Jul 2021 01:11:28 -0700 samuel@sieb.net wrote:
On 7/3/21 8:02 PM, dwoody5654@gmail.com wrote:
the url I am trying to download does not have an extension ie. no '.htm' such as: https://my.acbl.org/club-results/details/338288
wget does not download the correct web page.
What happens for you? I tried it with wget and I get the exact same page as Firefox does.
I get an index page. What options did you use for wget?
David
users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
On 2021-07-04 8:58 a.m., D&R wrote:
On Sun, 4 Jul 2021 01:11:28 -0700 samuel@sieb.net wrote:
On 7/3/21 8:02 PM, dwoody5654@gmail.com wrote:
the url I am trying to download does not have an extension ie. no '.htm' such as: https://my.acbl.org/club-results/details/338288
wget does not download the correct web page.
What happens for you? I tried it with wget and I get the exact same page as Firefox does.
I get an index page. What options did you use for wget?
I didn't use any options. wget https://my.acbl.org/club-results/details/338288 That gave me a file called "338288" and when I opened it with Firefox, it looked exactly the same as going to the URL.
From my website: http://www.fournotrump.com Note it is http not https so you may need to trust it. It is not asking for credit cards.
fournotrump.com is broken The new ACBL Live displaying of club results broke fournotrump.com
There is no plan to fix fournotrump.com
fournotrump.com is still working for the Hoppe club because David Blohm is not using ACBL Live.
Contact me directly if you want to discuss ACBL web site results.
Also can google: ACBL api
https://bridgewinners.com/forums/read/webmasters-forum/acbl-releases-api-for...
IMHO, your question is related to ACBL website, not to Fedora. I don't want to post things that are unrelated to Fedora here.
Jim Oser oserj@oserconsulting.com
On Sat, Jul 3, 2021, at 8:02 PM, dwoody5654@gmail.com wrote:
I have been using a shell script called save-page-as.sh to download a complete web page. This has been working as expected. The relevant line the the script is: "${browser}" "-new-window" "${url}" &>/dev/null
I now need the ability to run this program or another program via email to my computer from other locations. I do not have the option to login remotely.
The save-page-as.sh program runs firefox. I have not been able to get this to work using email. env shows DISPLAY=:0.0. I have added each of the Display commands as below:
export DISPLAY:0 export DISPLAY:0.0 export DISPLAY:0.1
None of those have worked.
the url I am trying to download does not have an extension ie. no '.htm' such as: https://my.acbl.org/club-results/details/338288
wget does not download the correct web page.
Appreciate any pointers to get the save-page-as.sh working using a browser or a different command line program.
David _______________________________________________ users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
On 2021-07-03 8:02 p.m., dwoody5654@gmail.com wrote:
the url I am trying to download does not have an extension ie. no '.htm' such as: https://my.acbl.org/club-results/details/338288
wget does not download the correct web page.
I tried it and it worked, sort of. The problem is that you want to download everything to view it offline, but the site my.acbl.org has a robots.txt that says "no robots allowed". So wget respects that and will not download any required files from that site other than the initial page. curl probably has the same issue.
On Mon, Jul 5, 2021 at 12:26 PM Samuel Sieb samuel@sieb.net wrote:
On 2021-07-03 8:02 p.m., dwoody5654@gmail.com wrote:
the url I am trying to download does not have an extension ie. no '.htm' such as: https://my.acbl.org/club-results/details/338288
wget does not download the correct web page.
I tried it and it worked, sort of. The problem is that you want to download everything to view it offline, but the site my.acbl.org has a robots.txt that says "no robots allowed". So wget respects that and will not download any required files from that site other than the initial page. curl probably has the same issue. _______________________________________________
On 2021-07-05 10:30 p.m., Thomas Stephen Lee wrote:
On Mon, Jul 5, 2021 at 12:26 PM Samuel Sieb samuel@sieb.net wrote:
On 2021-07-03 8:02 p.m., dwoody5654@gmail.com wrote:
the url I am trying to download does not have an extension ie. no '.htm' such as: https://my.acbl.org/club-results/details/338288
wget does not download the correct web page.
I tried it and it worked, sort of. The problem is that you want to download everything to view it offline, but the site my.acbl.org has a robots.txt that says "no robots allowed". So wget respects that and will not download any required files from that site other than the initial page. curl probably has the same issue. _______________________________________________
Ok, that solves it. I was able to download everything and opening the resulting file in Firefox didn't have any network access. I was able to see the entire page and even interact with it somewhat. wget -e robots=off -EHkp https://my.acbl.org/club-results/details/338288
Samuel Sieb writes:
On 2021-07-03 8:02 p.m., dwoody5654@gmail.com wrote:
the url I am trying to download does not have an extension ie. no '.htm' such as: https://my.acbl.org/club-results/details/338288
The extension doesn't matter to any of the utilities mentioned as far as I know. I'm pretty sure they get the MIME type from the HTTP Content-Type header.
wget does not download the correct web page.
I tried it and it worked, sort of. The problem is that you want to download everything to view it offline, but the site my.acbl.org has a robots.txt that says "no robots allowed". So wget respects that and will not download any required files from that site other than the initial page. curl probably has the same issue.
1. The page does not have content represented in HTML AFAICT: it's a blob which is parsed and formatted by a battery of (java)scripts, some of which are resources on the Internet, and some are inline. In other words, the HTML in that file is used as a container format to transport the scripts to the browser. Neither wget nor curl support Javascript at all as far as I know.
2. 96% of the page is in two blobs; AFAICT there were no IMG or other elements that specify requirements by URL. If so, that would explain why only the top page was downloaded.
3. curl does not document how it handles robots.txt. Since as far as I can tell curl has no recursive or get-requirements option, it probably doesn't handle it at all. wget documents that wget -r (recursive downloads) respects robots.txt. It does not document that wget -p (get page requisites, too) respects robots.txt, but a quick test suggests that it does. I think this is a bug: any interactive program that supports non-text media will download required resources with the access to the HTML file. (If someone agrees and wants to do something about it, this is a wget bug, not a Fedora bug.)
I don't have an alternative fetch tool to suggest, unfortunately. I think that you need to use a graphical browser somehow, or write a script in your favorite P-language.
Steve
On Tue, 6 Jul 2021 14:31:29 +0900 stephen@xemacs.org wrote:
Samuel Sieb writes:
On 2021-07-03 8:02 p.m., dwoody5654@gmail.com wrote:
the url I am trying to download does not have an extension ie. no '.htm' such as: https://my.acbl.org/club-results/details/338288
The extension doesn't matter to any of the utilities mentioned as far as I know. I'm pretty sure they get the MIME type from the HTTP Content-Type header.
wget does not download the correct web page.
I tried it and it worked, sort of. The problem is that you want to download everything to view it offline, but the site my.acbl.org has a robots.txt that says "no robots allowed". So wget respects that and will not download any required files from that site other than the initial page. curl probably has the same issue.
The page does not have content represented in HTML AFAICT: it's a blob which is parsed and formatted by a battery of (java)scripts, some of which are resources on the Internet, and some are inline. In other words, the HTML in that file is used as a container format to transport the scripts to the browser. Neither wget nor curl support Javascript at all as far as I know.
96% of the page is in two blobs; AFAICT there were no IMG or other elements that specify requirements by URL. If so, that would explain why only the top page was downloaded.
curl does not document how it handles robots.txt. Since as far as I can tell curl has no recursive or get-requirements option, it probably doesn't handle it at all. wget documents that wget -r (recursive downloads) respects robots.txt. It does not document that wget -p (get page requisites, too) respects robots.txt, but a quick test suggests that it does. I think this is a bug: any interactive program that supports non-text media will download required resources with the access to the HTML file. (If someone agrees and wants to do something about it, this is a wget bug, not a Fedora bug.)
I don't have an alternative fetch tool to suggest, unfortunately. I think that you need to use a graphical browser somehow, or write a script in your favorite P-language.
Steve
Thanks for the info.
I have been using a script called save-page-as.sh that runs firefox. I have changed the save-page-as in firefox to use the 'Web Page, complete' The savd-page-as.sh script sends a ctls-s to firefox and saves the page. It works perfectly when run from the command line. I have tried to use the save-page-as.sh script by sending an email to my computer. It does not run firefox for some reason. In searching it says that firefox can be run from a cron script by exporting DISPLAY. Running from a cron script, I would think, is similar to running a script from an email (using procmailrc) no luck , however.
env shows :0.0. I have tried several variations: export DISPLAY=:0 export DISPLAY=:0.0 export DISPLAY=:0.1
with no luck.
Perhaps there is another setting that need to be included as well.
Any thoughts?
David
users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
D&R writes:
env shows :0.0. I have tried several variations: export DISPLAY=:0 export DISPLAY=:0.0 export DISPLAY=:0.1
The first two should work, not sure about the second. I would check the following:
1. Is there actually a firefox executable where the script expects it? (Note that depending on how you run the script, it may not actually search PATH for firefox.) If you're sure Firefox is starting, then this doesn't apply. 2. If you're running over ssh, you may have a different DISPLAY (typically 10.0) or no DISPLAY set at all. This is very unlikely, I suppose you're running the script the same way you ran env. Oh, make sure you're running env on the host where the script is! 3. Nowadays most X servers require authentication from clients, usually a simple MIT-MAGIC-COOKIE by default. This requires credentials be stashed somewhere, typically $HOME/.Xauthority on the host where the client is started.
All of this seems kinda unlikely, but it's what I can come up with offhand. There is a kind of X server that you may be able to install (Xnull or Xtest or something like that), that doesn't actually do anything except speak the X protocol back to the client. I don't know whether that would help, but it could avoid a flash of a window (and perhaps save a few milliseconds in execution).
Steve