Hi All,
This is a puzzle I have been working on for about two years.
On this web page:
https://www.eset.com/us/business/endpoint-security/windows-security/download...
From the command line, I am trying to read the revision shown here:
which is 7.3.2039.0.
Problem is that the revision is generated by a java script. And curl and wget have no way of running java scripts. And I have not figured out how to get it with headless Firefox.
I asked ESET if they'd give me a direct download link, but they ignored me.
Any of you guys have a cleaver way to cli extract the revision?
Many thanks, -T
On Sat, 22 Aug 2020 18:42:57 -0700 ToddAndMargo via users wrote:
Any of you guys have a cleaver way to cli extract the revision?
I'd say lynx, but apparently lynx doesn't support javascript, however the google search did turn up lots of articles pointing to other text mode browsers that do support javascript.
There is also this kludgery: Run normal browser inside a VNC instance and use the XTEST extension to feed it commands to save the html :-).
On Sat, 22 Aug 2020 21:54:27 -0400 Tom Horsley wrote:
however the google search did turn up lots of articles pointing to other text mode browsers that do support javascript.
elinks is in the fedora repos and can apparently be badgered into supporting javascript (never tried it, just reporting what I read).
On Sat, Aug 22, 2020 at 9:43 PM ToddAndMargo via users < users@lists.fedoraproject.org> wrote: ... snip ...
Any of you guys have a cleaver way to cli extract the revision?
I'm not sure how you want to retrieve it, but using Firefox->Tools->Web Developer I looked at all of the transactions in the page, and the one that retrieves the version info was:
https://www.eset.com/ca/business/endpoint-security/windows-security/download...
Now I don't know how important/relevant/specific those parameters are, but fetching that returns some JSON, and part of it was:
...
"files": { "installer": { "23358": { "url": "https://download.eset.com/com/eset/apps/business/ees/windows/latest/ees_nt64...", "full_version": "7.3.2039.0",
...
Hope that helps a little.
On 8/22/20 6:42 PM, ToddAndMargo via users wrote:
https://www.eset.com/us/business/endpoint-security/windows-security/download...
... Problem is that the revision is generated by a java script.
The version isn't *generated* by JavaScript, it's requested from an API by JavaScript. The difference being that you don't need to run JavaScript to get the answer, you just need to query the API.
So, the first step is to turn on Firefox's web developer tools and look at the Network pane. In that window, load the page above. Each query detailed in the network pane should indicate the type of response. Look for the "json" responses. There are only three, so it's easy to find.
(You can do this in Chrome as well, but the "type" is xhr there, and there are dozens of such responses, so it's much less obvious.)
Now, right-click that entry and select Copy -> Copy as cURL. Past that into a terminal and hit enter to see the response. You should get a JSON document. Now you just need to extract the version from that. There are a variety of tools you can use. My "jq" isn't very good, but this is pretty close:
curl ... | jq '.files.installer|..|.full_version?'
I'll also mention that in this case specifically, you don't need to "copy as curl". You can copy just the URL and pass that to curl on the command line. Some of the time, a site or API may check the referrer or other headers, and in those cases, it's good to know about "copy as curl" so try to remember that option.
On 2020-08-22 19:30, Fulko Hew wrote:
On Sat, Aug 22, 2020 at 9:43 PM ToddAndMargo via users <users@lists.fedoraproject.org mailto:users@lists.fedoraproject.org> wrote: ... snip ...
Any of you guys have a cleaver way to cli extract the revision?I'm not sure how you want to retrieve it, but using Firefox->Tools->Web Developer I looked at all of the transactions in the page, and the one that retrieves the version info was:
https://www.eset.com/ca/business/endpoint-security/windows-security/download...
Now I don't know how important/relevant/specific those parameters are, but fetching that returns some JSON, and part of it was:
...
"files": { "installer": { "23358": { "url": "https://download.eset.com/com/eset/apps/business/ees/windows/latest/ees_nt64...", "full_version": "7.3.2039.0",
...
Hope that helps a little.
Awesome!
That is one YUGE 1 line return from curl, but I can whack it with regex. Thank you!
How did you figure that out?
On 2020-08-22 19:35, Gordon Messmer wrote:
On 8/22/20 6:42 PM, ToddAndMargo via users wrote:
https://www.eset.com/us/business/endpoint-security/windows-security/download...
... Problem is that the revision is generated by a java script.
The version isn't *generated* by JavaScript, it's requested from an API by JavaScript. The difference being that you don't need to run JavaScript to get the answer, you just need to query the API.
So, the first step is to turn on Firefox's web developer tools and look at the Network pane. In that window, load the page above. Each query detailed in the network pane should indicate the type of response. Look for the "json" responses. There are only three, so it's easy to find.
(You can do this in Chrome as well, but the "type" is xhr there, and there are dozens of such responses, so it's much less obvious.)
Now, right-click that entry and select Copy -> Copy as cURL. Past that into a terminal and hit enter to see the response. You should get a JSON document. Now you just need to extract the version from that. There are a variety of tools you can use. My "jq" isn't very good, but this is pretty close:
curl ... | jq '.files.installer|..|.full_version?'
I'll also mention that in this case specifically, you don't need to "copy as curl". You can copy just the URL and pass that to curl on the command line. Some of the time, a site or API may check the referrer or other headers, and in those cases, it's good to know about "copy as curl" so try to remember that option.
Thank you!
On Sat, Aug 22, 2020 at 11:11 PM ToddAndMargo via users < users@lists.fedoraproject.org> wrote: ... snip ...
I'm not sure how you want to retrieve it, but using Firefox->Tools->Web Developer
How did you figure that out?
I used that Firefox->Tools->Web Developer tool and looked at everything that was fetched. Besides the main HTML, a bunch of JS files and image files, the only other file it fetched was the one I mentioned, so I looked at its output to find it was a JSON file, and one of the first fields was your version number.
Problem solved (60 seconds).
On 2020-08-22 20:30, Fulko Hew wrote:
On Sat, Aug 22, 2020 at 11:11 PM ToddAndMargo via users <users@lists.fedoraproject.org mailto:users@lists.fedoraproject.org> wrote: ... snip ...
> I'm not sure how you want to retrieve it, but using Firefox->Tools->Web > Developer How did you figure that out?I used that Firefox->Tools->Web Developer tool and looked at everything that was fetched. Besides the main HTML, a bunch of JS files and image files, the only other file it fetched was the one I mentioned, so I looked at its output to find it was a JSON file, and one of the first fields was your version number.
Problem solved (60 seconds).
Dang! Which web development tool did you use?
On Sat, Aug 22, 2020 at 11:33 PM ToddAndMargo via users < users@lists.fedoraproject.org> wrote:
On 2020-08-22 20:30, Fulko Hew wrote:
On Sat, Aug 22, 2020 at 11:11 PM ToddAndMargo via users <users@lists.fedoraproject.org mailto:users@lists.fedoraproject.org> wrote: ... snip ...
> I'm not sure how you want to retrieve it, but using Firefox->Tools->Web > Developer How did you figure that out?I used that Firefox->Tools->Web Developer tool and looked at everything that was fetched
... snip ...
Dang! Which web development tool did you use?
I mentioned it twice above... Firefox. In the Tools menu, the Web Developer option, and selecting the Network portion. It shows all the fetches, the responses and timing info. (Once you start the tool, you do have to re-fetch the page for it to be captured.)
On Sat, Aug 22, 2020 at 11:42 PM Fulko Hew fulko.hew@gmail.com wrote:
Dang! Which web development tool did you use?
I mentioned it twice above... Firefox. In the Tools menu, the Web Developer option, and selecting the Network portion. It shows all the fetches, the responses and timing info. (Once you start the tool, you do have to re-fetch the page for it to be captured.)
Chrome too has a similar tool, under a similar menu path.
On 2020-08-22 20:42, Fulko Hew wrote:
On Sat, Aug 22, 2020 at 11:33 PM ToddAndMargo via users <users@lists.fedoraproject.org mailto:users@lists.fedoraproject.org> wrote:
On 2020-08-22 20:30, Fulko Hew wrote: > > > On Sat, Aug 22, 2020 at 11:11 PM ToddAndMargo via users > <users@lists.fedoraproject.org <mailto:users@lists.fedoraproject.org> <mailto:users@lists.fedoraproject.org <mailto:users@lists.fedoraproject.org>>> > wrote: > ... snip ... > > > I'm not sure how you want to retrieve it, but using > Firefox->Tools->Web > > Developer > > > How did you figure that out? > > > I used that Firefox->Tools->Web Developer tool > and looked at everything that was fetched... snip ...
Dang! Which web development tool did you use?I mentioned it twice above... Firefox. In the Tools menu, the Web Developer option, and selecting the Network portion. It shows all the fetches, the responses and timing info. (Once you start the tool, you do have to re-fetch the page for it to be captured.)
I am on Network. For the life of me, I do not see what you are seeing. Were you on "All" or did your have a filter checked?
On 2020-08-22 21:16, ToddAndMargo via users wrote:
On 2020-08-22 20:42, Fulko Hew wrote:
On Sat, Aug 22, 2020 at 11:33 PM ToddAndMargo via users <users@lists.fedoraproject.org mailto:users@lists.fedoraproject.org> wrote:
On 2020-08-22 20:30, Fulko Hew wrote: > > > On Sat, Aug 22, 2020 at 11:11 PM ToddAndMargo via users > <users@lists.fedoraproject.org mailto:users@lists.fedoraproject.org <mailto:users@lists.fedoraproject.org mailto:users@lists.fedoraproject.org>> > wrote: > ... snip ... > > > I'm not sure how you want to retrieve it, but using > Firefox->Tools->Web > > Developer > > > How did you figure that out? > > > I used that Firefox->Tools->Web Developer tool > and looked at everything that was fetched
... snip ...
Dang! Which web development tool did you use?
I mentioned it twice above... Firefox. In the Tools menu, the Web Developer option, and selecting the Network portion. It shows all the fetches, the responses and timing info. (Once you start the tool, you do have to re-fetch the page for it to be captured.)
I am on Network. For the life of me, I do not see what you are seeing. Were you on "All" or did your have a filter checked?
I found this:
{"GET":{"scheme":"https","host":"www.eset.com","filename":"/us/business/endpoint-security/windows-security/download/","query":{"type":"13554","tx_esetdownloads_ajax[product]":["82","82"],"tx_esetdownloads_ajax[beta]":["0","0"],"tx_esetdownloads_ajax[page_id]":["931","931"],"tx_esetdownloads_ajax[plugin_id]":["1456376","1456376"]},"remote":{"Address":"13.226.14.122:443"}}}
But how did you turn that into the link you give me?
On 2020-08-22 21:27, ToddAndMargo via users wrote:
On 2020-08-22 21:16, ToddAndMargo via users wrote:
On 2020-08-22 20:42, Fulko Hew wrote:
On Sat, Aug 22, 2020 at 11:33 PM ToddAndMargo via users <users@lists.fedoraproject.org mailto:users@lists.fedoraproject.org> wrote:
On 2020-08-22 20:30, Fulko Hew wrote: > > > On Sat, Aug 22, 2020 at 11:11 PM ToddAndMargo via users > <users@lists.fedoraproject.org mailto:users@lists.fedoraproject.org <mailto:users@lists.fedoraproject.org mailto:users@lists.fedoraproject.org>> > wrote: > ... snip ... > > > I'm not sure how you want to retrieve it, but using > Firefox->Tools->Web > > Developer > > > How did you figure that out? > > > I used that Firefox->Tools->Web Developer tool > and looked at everything that was fetched
... snip ...
Dang! Which web development tool did you use?
I mentioned it twice above... Firefox. In the Tools menu, the Web Developer option, and selecting the Network portion. It shows all the fetches, the responses and timing info. (Once you start the tool, you do have to re-fetch the page for it to be captured.)
I am on Network. For the life of me, I do not see what you are seeing. Were you on "All" or did your have a filter checked?
I found this:
{"GET":{"scheme":"https","host":"www.eset.com","filename":"/us/business/endpoint-security/windows-security/download/","query":{"type":"13554","tx_esetdownloads_ajax[product]":["82","82"],"tx_esetdownloads_ajax[beta]":["0","0"],"tx_esetdownloads_ajax[page_id]":["931","931"],"tx_esetdownloads_ajax[plugin_id]":["1456376","1456376"]},"remote":{"Address":"13.226.14.122:443"}}}
But how did you turn that into the link you give me?
Got it.
I was opening the specs on the GET on the right side. The URL I needed was on the left side.
Right Click -> Copy -> Copy URL
You are right, it was easy, once I got my head wrapped around it!
Thank you!
With everyone's help:
my Str $WebSite = "https://www.eset.com/us/business/endpoint-security/windows-security/download..."; # need that trailing backslash my Str $WebSite2 = ""; my Str $RevSite = ""; my Str $WebPage = ""; my $PageStatus; my Str $ReturnStr; my Str $CurlStatus;
if $Debug { $WebSite2 = "-v $WebSite"; } else {$WebSite2 = $WebSite; } ( $WebPage, $PageStatus ) = CurlGetWebSite( $WebSite2 ); # PrintGreenErr( "webpage status = $PageStatus\nWebpage =<$WebPage>\n" );
if $PageStatus ne 0 || $PageStatus.contains( "301 Moved Permanently" ) { if $Debug { PrintBlueErr( "$SubName: unable to download New Rev page:\n $WebPage\n" ); } PrintRedErr( "$SubName: error: $NewRev Revision download page failed. Bummer ...\n" ); if $PageStatus.contains( "301 Moved Permanently" ) { PrintRedErr( " 301 Moved Permanently\n" ); } $RevSite = ""; $NewRev = "0"; $Status +|= %StatusHash<DOWNLOAD_FAIL>;
} else { # if the web page download okay, extact the latest new revison from it
# <div data-value="https://www.eset.com/us/business/endpoint-security/windows-securiy/download/?type=13554&tx_esetdownloads_ajax..." id="apiUrl"></div> $RevSite = $WebPage; $RevSite ~~ s| .*? 'https://www.eset.com/us/business/endpoint-security/windows-security/download...' |https://www.eset.com/us/business/endpoint-security/windows-security/download...; $RevSite ~~ s| '"' .* ||; # PrintBlue( "$RevSite = <$RevSite>\n" );
( $WebPage, $PageStatus ) = CurlGetWebSite( $RevSite ); if ( $PageStatus ne 0 ) { if $Debug { PrintBlueErr( "$SubName: unable to download New Rev page:\n $RevSite\n" ); } PrintRedErr( "$SubName: error: $RevSite Revision download page failed. Bummer ...\n" ); if $PageStatus.contains( "301 Moved Permanently" ) { PrintRedErr( " 301 Moved Permanently\n" ); } $NewRev = "0"; $Status +|= %StatusHash<DOWNLOAD_FAIL>;
} else { # {"family_name":"ESET Endpoint Security","version":7,"changelogs":{"38":"<h3>Version 7.3.2039.0</h3> $NewRev = $WebPage; $NewRev ~~ s| .*? "Version " ||; $NewRev ~~ s| '<' .* ||; # PrintBlue( "NewRev = <$NewRev>\n" ); }
On 2020-08-23 02:42, ToddAndMargo via users wrote:
Hi All,
This is a puzzle I have been working on for about two years.
Yes, and you've asked lots of questions on the curl mailing list and, it seems, not read (or not understood) what you've been told.
Problem is that the revision is generated by a java script.
No it isn't (as someone else here has explained).
Also there is no Java involved. The Java programming language has nothing at all to do with the different programming language named Javascript.
And curl and wget have no way of running java scripts.
No, but as people on the curl list have explained before, you can fetch pages and parse out the relevant details and work out what to fetch next.
On 2020-08-23 06:33, Jeremy Nicoll - ml fedora wrote:
On 2020-08-23 02:42, ToddAndMargo via users wrote:
Hi All,
This is a puzzle I have been working on for about two years.
Yes, and you've asked lots of questions on the curl mailing list and, it seems, not read (or not understood) what you've been told.
There are a lot of great guys on that list. They never were able to figure that one out for me. I posted back to that list yesterday with what Fulko and Gordon taught me. Fulko and Gordon are extremely smart guys.
Problem is that the revision is generated by a java script.
No it isn't (as someone else here has explained).
Also there is no Java involved. The Java programming language has nothing at all to do with the different programming language named Javascript.
I am not sure where you are coming from. I state (Brendan Eich 's) "java script" in the Subject line and all over the place. I no where stated or implied that it was Java the programming Language. I wish they had called them two different names.
And JSON is an extension of Java Script. The "JS" stands for "Java Script"
The JavaScript Object Notation (JSON) Data Interchange Format https://tools.ietf.org/html/rfc8259
I was guessing that it was Java Script. I was not the exact Java Script I was looking for, but it was Java Script. My mistake was thinking I needed to "push the button".
Have you seen the YUGE page of data file that JSON script points to. Yikes. Love regex!
And curl and wget have no way of running java scripts.
No, but as people on the curl list have explained before, you can fetch pages and parse out the relevant details and work out what to fetch next.
Not until Fulko and Gordon did I know what to look for. The guys on the curl group were not as explicit as Fulko and Gordon. I did what the mensches on the curl list told me to do and dug around a lot, but could not make heads or tails out of the page. They did not tell me exactly what to look for.
Thank you for the comments, -T
p.s. Brendan Eich seems to be doing wonderfully well over at Brave. Very nice code.
On 8/23/20 3:09 PM, ToddAndMargo via users wrote:
On 2020-08-23 06:33, Jeremy Nicoll - ml fedora wrote:
On 2020-08-23 02:42, ToddAndMargo via users wrote:
Problem is that the revision is generated by a java script.
No it isn't (as someone else here has explained).
Also there is no Java involved. The Java programming language has nothing at all to do with the different programming language named Javascript.
I am not sure where you are coming from. I state (Brendan Eich 's) "java script" in the Subject line and all over the place. I no where stated or implied that it was Java the programming Language. I wish they had called them two different names.
And JSON is an extension of Java Script. The "JS" stands for "Java Script"
JavaScript is one word. "A java script" as you wrote earlier could easily be misunderstood.
$RevSite ~~ s| .*? 'https://www.eset.com/us/business/endpoint-security/windows-security/download...' |https://www.eset.com/us/business/endpoint-security/windows-security/download...;
I cleaned that regex up:
if ( $RevSite ~~ s| .*? ('https://www.eset.com/us/home/internet-security/download/?type=13554') |$0| ) {
What is in the first part between the () is copied into $0, so I don't have to repeat it. Makes it a lot easier to read.
On 2020-08-23 23:09, ToddAndMargo via users wrote:
On 2020-08-23 06:33, Jeremy Nicoll - ml fedora wrote:
On 2020-08-23 02:42, ToddAndMargo via users wrote:
Hi All,
This is a puzzle I have been working on for about two years.
Yes, and you've asked lots of questions on the curl mailing list and, it seems, not read (or not understood) what you've been told.
There are a lot of great guys on that list. They never were able to figure that one out for me.
Rubbish. People on that list are experts in using curl, and properly understand what it does and doesn't do.
The problem is that you don't seem to understand the huge difference between what curl (or wget) do, and what a browser does.
I posted back to that list yesterday with what Fulko and Gordon taught me. Fulko and Gordon are extremely smart guys.
You were told about (and indeed replied about) Firefox's developer tools as far back as Aug 2018. The problem is, you don't seem to have gone and read the Mozilla documentation on how to use them, far less explored their capabilities.
A while ago I wrote a description, for someone elsewhere, about what a browser typically does to fetch a web page. This is it:
------------------------------------------------------------------ When a browser fetches "a page" what happens (glossing over all the stuff that can make this even more complicated) is:
- it asks the server for basic page html
- the server returns page meta data (length, when last changed, etc and possibly things like redirects, if the website now lives somewhere else and should automatically go there instead)
- with a browser the user never sees this stuff, but it's visible in the browser console in developer tools, and with curl if you coe your request in the right way curl will put the returned headers etc in a file for you, separately from html etc
- if the metadata etc meant that html should actually be returned the server would send it. It might also send some "cookies" back to the user's browser.
- with curl you can have any returned cookies put in a file too
- the browser would then do a preliminary parse of the source html, finding all the embedded references to things like css files, image files, javascript files etc, and make separate requests for all of them.
- curl does not do any of that for you. You need to read the html returned by a previous stage, and decide if you want to fetch anything else and explicitly ask for it
For any of those requests that were to the original server, the browser would send back that server-specific cookie data, so the server can see the new requests are from the same user as the first one.
- curl would only send cookie data back if you explicitly tell it to do so, and you have to tell it which data to send back
- for every file apart from the first one that's fetched from anywhere the metadata and cookie logic is done for them too. If they're not image files, (ie they are css or scripts) they also will be parsed to dig out references to embedded files (for example scripts often use other people's scripts which in turn use someone else's and so on, and they all need to be fetched.
- eventually the browser will think it has all the parts that make up what it needs to display the page you wanted.
- at some point the browser does a detailed parse of the whole file assembled from the bits. In a modern webpage there is very likely to be Javascript code that needs to execute before anything is shown to the user. Sometimes some of that will generate more external file references (eg building names of required files from pieces of information that was not present in any one part of any of the files fetched so far.
- curl will of course not execute the Javascript, but you could in theory try to work out what it does. Eg when looking using Developer Tools in Firefox you can run the JS under a debugger and follow what it does, so could eg see that the URL for another file that has to be fetched is built up in a particular way from snippets of the JS. Then you could replicate that in future by extracting the contents of the snippets and joining them together in your own code. For example the JS might fetch something from a URL made up of some base value, a date, and a literal, all of which would need to be in the code somewhere.
In particular the use of cookies for successive fetches, allowing the server to see that the fetches were all from the same user, may eg mean that "deal of the week" info will somehow have been tailored to you. The server will also know not just what country you are in but also what regions (if you're not using a VPN to fool it), as the ip address of your computer will correspond to one of the ranges of addresses used by your ISP.
Anyway the initial JS code might mean the browser has to fetch more files. So it will, repeating most of the above logic for them too.
Finally it works out what to display and shows it to you.
- after that, modern webpages are very JS intensive. There's often JS logic that executes as you move the mouse around. It's one of the ways that pages react to the mouse moving over certain bits of the page. Some of it is in html itself, but other parts are coded in essence to say eg "if the mouse drifts over this bit" or "if the mouse drifts away from here" then run such-and-such a bit of JS. Any of those little bits of JS can cause more data to be fetched from a server - that could be ads, or it could be something to do with the real site.
Finally things like "next screen" buttons might execute JS before actually requesting something. The JS might encapsulate data about you and your activity using the page, as well as just ask for more data. Certainly cookie info set by the initial page fetch will be returned to the server... .
To replicate all of the above is difficult. To do it accurately you would need to write in your own scripts (that issue curl commands) a lot of extra logic.
An alternative to trying to write a whole browser (in essence) is to use "screen scraping" software. It is specifically designed to use the guts of a browser to fetch stuff and present it - in a machine- readable way - to logic that can, say, extract an image of part of a web page and then run OCR on it to work out what it says.
Another alternative is to use something like AutoIt or AutoHotKey to write a script that operates a browser by pretending to be a human using a computer - so eg it will send mouse clicks to the browser in the same way that (when a user is using a computer) the OS sends info about mouse clicks to a browser.
---------------------------------------------------------------
Problem is that the revision is generated by a java script.
No it isn't (as someone else here has explained).
Also there is no Java involved. The Java programming language has nothing at all to do with the different programming language named Javascript.
I am not sure where you are coming from. I state (Brendan Eich 's) "java script" in the Subject line and all over the place. I no where stated or implied that it was Java the programming Language. I wish they had called them two different names.
Universally in computing if someone says
"python script, or lua script, or rexx script"
they mean "a script written in python, or a script written in lua or a script written in rexx"... so when you keep saying "java script" it looks like you think you're talking about a script written in Java.
Some webpages etc DO use Java, just not very many.
If you're ever googling for info about what a javascript script does then googling for "java script" is likely to show you info about how things are done in Java, not in Javascript.
Everything in programming requires one to be precise. Using the wrong terminology will not help you.
And JSON is an extension of Java Script.
No it isn't. It's a data exchange format invented for use in Javascript though nowadays you'll find it used elsewhere too. It has nothing to do with Java.
The "JS" stands for "Java Script"
Sort-of. It's two letters of the single word "Javascript", which is sometimes written as "JavaScript".
And curl and wget have no way of running java scripts.
No, but as people on the curl list have explained before, you can fetch pages and parse out the relevant details and work out what to fetch next.
Not until Fulko and Gordon did I know what to look for.
I think you need to understand far better what a browser does, and play with the browser developer tools (on simpler sites than the eset one) to see what they can do for you.
The people on the curl list expected you to go and do that.
The developer tools will show you, when you fetch "a" page, that the browser in fact fetches a whole load of different files, and if you poke around you will see how sets of those are fetched after earlier files have been fetched (and parsed) and seen to contain references to later ones. You can also see eg how long each of those fetches takes.
The tools also allow you to see the contents of scripts that are fetched individually as well as isolated sections of JS that are embedded in a page.
You can intercept what the bits of JS do and watch them execute in a debugger, and alter them. (To find out how, you need to read the Mozilla (for Firefox) or whoever else's) docs AND you need to experiment with simple pages that you already fully understand - ideally your own - to see how the tools let you explore and alter what a page does.
One of the problems with many modern websites is their programmers grab bits of JS from "toolkits" and "libraries" written by other people, eg to achieve some amazing visual effect on a page. They might embed 100 KB of someone-else's JS and CSS, just to make one tiny part of their site do something "clever". Often almost all of the embedded/associated JS on a page isn't actually used on that page, but the site designers neither know nor care.
Another issue is that (say) a JS library might exist in more than one form. Often sites embed "minimised" versions of commonly-used scripts - these have spaces and comments removed and variable names etc reduced to one or two characters. The script is then maybe a tenth of the size of an easily-readable-by-a-human version (so will download faster and waste less of the server's resources). Your browser will understand a minimised script just as easily as a verbose human-readable one ... but you won't. Some commonly used scripts (eg those for "jquery") exist in matching minimised and readable forms, so you could download a readable equivalent (and I think developer tools will sometimes do that for you).
But... to understand what a script that uses (eg) jquery is doing, you'll either need to look at it in fine detail (& understand what it is capable of doing - so eg know all about the "document object model" and how css and JS are commonly used on webpages) or at the very least have skim-read lots of jquery documentation.
It won't necessarily be easy to work out which bits of JS on a page are just for visual effects, and which bits are for necessary function.
The guys on the curl group were not as explicit as Fulko and Gordon.
You're expected when using a programmers' utility to read its documentation and anything else that people mention. You're especially expected to understand how the whole browser process works.
I did what the mensches on the curl list told me to do and dug around a lot, but could not make heads or tails out of the page.
Then you should have asked for more help. But it's important when doing to that to show that you have made an effort to understand; to tell people what you read (so eg if you're going off on a wild goose chase people can point you back at the relevant things) and what you've tried.
You might eg explain how after fetching html page a you then discovered the names of scripts b and c on it, and fetched those, but didn't know what to do next.
No-one, on a voluntary support forum, is going to do the whole job for you. They might, if they have the time and the inclination, look at what you've managed to do so far (and ideally the code you used to do it) and suggest how to add to it to do the next stage.
The other aspect of that is that if you demonstrate some level of skill as a programmer, people will know at what level to pitch their replies. If the person asking how to do something cannot write working programs they have no chance of automating any process using scripts, parsing html or JS that they fetch etc. On the other hand if someone shows that they already understand all that, they're more likely to get appropriate help.
I've written sequences of curl requests interspersed with logic (in Regina REXX and ooREXX) that grabs successive pages of a website, finding on each one the information required to grab the next page in ... but it took days & days to make the whole process work reliably. One of my sets of these processes grabbed crosswords from the puzzle pages of a certain newspaper. The structure of the pages was different on Mon-Fri, Sat, and Sun. And at certain times of year, eg Easter, different again. Over time the editorial and presentational style of the site changed too, so code that worked perfectly for weeks could easily suddenly go wrong because the underlying html (or the css around it) would suddenly change. So code I wrote to extract data from returned pages and scripts needed to be 'aware' that layout of content in the html etc might not be the same as any of the previously-seen layouts, and stop and tell me if something didn't seem right.
Writing reliable logic to do this is not straightforward.
My idea of what is straightforward might not match yours. (I've a computing degree and worked for years as, first, a programmer (on microcomputers and a small mainframe) then a systems programmer (installing & customing the OS for a bank's mainframe), then a programmer again leading a programming team writing systemy automation programs for that bank (ie we weren't part of the teams that wrote code that moved money around).
Even so, the websites I've explored using curl and parsing what comes back and then issuing more curl requests tended to be less complex some years ago than the norm nowadays. It's not necessarily impossible to do it, but it gets harder and harder to understand the code on many sites, so working out what the "glue" logic required to do this is, is more and more difficult.
Sometimes in the past, eg when smartphones were a whole lot less capable, websites existed in simpler forms for phone users. Sometimes also much simpler ones for eg blind users using screen-readers. When that was the case, making curl etc grab the blind-users' website pages considerably simplified the whole process. Nowadays, it's more common for there only to be one version of a website, with much more complex code on it which might adapt to the needs of eg blind users. It's therefore sometimes worthwhile looking at the website help pages, if it has them, particularly any info about "accessibility" to see if there's any choice in what you get. (Though replicating that may need you to simulate a login and/or use cookies saved from a previous visit.) And if all it does is eg making a page /look/ simpler, but the html & scripts sent to you is unchanged, there'll be no advantage unless the route through the page JS etc is simpler - ie if you're using a debugger to work out what it does, that process may be simpler. But probably it won't.)
Jeremy,
You seems very intent and put a lot of work into jumping to conclusions and picking fights.
This list is about folks sharing and assisting others, not attacking them. So from this point forward, I am placing you into my kill file.
-T
On 2020-08-24 18:38, Jeremy Nicoll - ml fedora wrote:
A while ago I wrote a description, for someone elsewhere, about what a browser typically does to fetch a web page. This is it:
Thanks for tutorial (a.k.a. description) about what happens when a browser fetches a web page. It actually did answer a few of my "I wonder"s.
On 8/24/20 4:18 AM, ToddAndMargo via users wrote:
You seems very intent and put a lot of work into jumping to conclusions and picking fights.
I did not see that in his email.
This list is about folks sharing and assisting others, not attacking them. So from this point forward, I am placing you into my kill file.
There was a huge amount (somewhat excessive, a link would have been better if possible) of useful information there including how to interact with people you are hoping to get help from. If you think he came to the wrong conclusions, then you could try correcting that or just ignore it and continue on. But I didn't see any "attacking". It seems that there's a pretty low bar to get into your kill file and I don't see the point of that anyway.
On 2020-08-24 11:30, Samuel Sieb wrote:
On 8/24/20 4:18 AM, ToddAndMargo via users wrote:
You seems very intent and put a lot of work into jumping to conclusions and picking fights.
I did not see that in his email.
This list is about folks sharing and assisting others, not attacking them. So from this point forward, I am placing you into my kill file.
There was a huge amount (somewhat excessive, a link would have been better if possible) of useful information there including how to interact with people you are hoping to get help from. If you think he came to the wrong conclusions, then you could try correcting that or just ignore it and continue on. But I didn't see any "attacking". It seems that there's a pretty low bar to get into your kill file and I don't see the point of that anyway.
Responding off list
On 2020-08-22 19:30, Fulko Hew wrote:
On Sat, Aug 22, 2020 at 9:43 PM ToddAndMargo via users <users@lists.fedoraproject.org mailto:users@lists.fedoraproject.org> wrote: ... snip ...
Any of you guys have a cleaver way to cli extract the revision?I'm not sure how you want to retrieve it, but using Firefox->Tools->Web Developer I looked at all of the transactions in the page, and the one that retrieves the version info was:
https://www.eset.com/ca/business/endpoint-security/windows-security/download...
Now I don't know how important/relevant/specific those parameters are, but fetching that returns some JSON, and part of it was:
...
"files": { "installer": { "23358": { "url": "https://download.eset.com/com/eset/apps/business/ees/windows/latest/ees_nt64...", "full_version": "7.3.2039.0",
...
Hope that helps a little.
Hi Fulko,
I have now used your technique to extract revision numbers from two other web pages. AWESOME! Thank you!
-T