Hi.
A few weeks ago someone posted an off-topic thread about scraping javascript/dynamic sites. Sorry to say, I've got a similar off-topic post.
If this is unacceptable, let me know and I'll delete the thread.
I'm dealing with the results of a url/site that has javascript. I had thought I could simply use Firefox, and hit the Developer Tools, and use the Inspector subWindow.
All of this seems to work. However, in the Inspector window, I cant figure out how to "expand" all the nodes to see the complete html of the generated page.
Been looking all over the net to figure this out. I know it's something subtle.
I can set the "mouse" to the "html" node at the top of the window. Using the "right mouse" click I can select the "Exapnd All" option, and it appears to expand the nodes within the html. However, I can't seem to figure out how to then do a "Select All" for all the html in the Inpspector window so I can view the complete html in an external editor.
Any idea how this can be accomplished?
thanks!
On 2020-09-02 10:50, bruce wrote:
All of this seems to work. However, in the Inspector window, I cant figure out how to "expand" all the nodes to see the complete html of the generated page.
On the page itself - either the normal page view or the top part of the tools window - riht-click the background & choose "View Source". A separate window will open showing the whole file.
Hi Jeremy.
Doing a "view source" only shows the static source. To get the dynamic gnerated content from the avascript you need to dive into the Developers Tools/Inspector tab.
On Wed, Sep 2, 2020 at 6:26 AM Jeremy Nicoll - ml fedora jn.ml.fdr.287@wingsandbeaks.org.uk wrote:
On 2020-09-02 10:50, bruce wrote:
All of this seems to work. However, in the Inspector window, I cant figure out how to "expand" all the nodes to see the complete html of the generated page.
On the page itself - either the normal page view or the top part of the tools window - riht-click the background & choose "View Source". A separate window will open showing the whole file.
-- Jeremy Nicoll - my opinions are my own _______________________________________________ users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org
On 2020-09-02 17:12, bruce wrote:
Hi Jeremy.
Doing a "view source" only shows the static source. To get the dynamic gnerated content from the avascript you need to dive into the Developers Tools/Inspector tab.
Might not the problem be that what you're hoping to see is html corresponding to the DOM that's been altered by dynamic JS? But does that actually exist, other than in the browser's internal data/control structures?
That is, I expect the browser reads the original html, parses it, builds a data structure that represents the DOM, then possibly modifies that (once or many times) via JS. It'd then render the page by working from the internal data structure, not any part of the character-based html.
Do you see what I mean?
On 2020-09-02 23:23, Jeremy Nicoll - ml fedora wrote:
On 2020-09-02 17:12, bruce wrote:
Hi Jeremy.
Doing a "view source" only shows the static source. To get the dynamic gnerated content from the avascript you need to dive into the Developers Tools/Inspector tab.
Might not the problem be that what you're hoping to see is html corresponding to the DOM that's been altered by dynamic JS? But does that actually exist, other than in the browser's internal data/control structures?
That is, I expect the browser reads the original html, parses it, builds a data structure that represents the DOM, then possibly modifies that (once or many times) via JS. It'd then render the page by working from the internal data structure, not any part of the character-based html.
Do you see what I mean?
Ah, forget that. I found from a test page here that right-clicking the "<html>" at the very top of the html tree structure at the bottom left of the tools window then using
Copy - Inner HTML or Copy - Outer HTML
allowed me to paste the whole of the dynamic html into another file.
If instead of choosing "<html>" I chose eg "<head>" (or presumably another smaller part of the tree, then I got just its corresponding smaller amount of html. For example in a test page here whose original source has a series of test paragraphs starting with just
<p> Body text 1 enclosed by p-tags. </p>
some JS colours that. Clicking on the leading "<p>" then choosing the Copy - Inner HTML option, on the tools page when it's showing the coloured results, gives me
Body text 1 enclosed by p-tags.
but Copy - Outer HTML gives
<p style="background-color: red;"> Body text 1 enclosed by p-tags. </p>
Jeremy!!
As Homer Simpson says.. DOH!
thanks.
Now to figure out how to implement code with a headless browser to get the same content/html.
much appreciation.
On Wed, Sep 2, 2020 at 6:38 PM Jeremy Nicoll - ml fedora jn.ml.fdr.287@wingsandbeaks.org.uk wrote:
On 2020-09-02 23:23, Jeremy Nicoll - ml fedora wrote:
On 2020-09-02 17:12, bruce wrote:
Hi Jeremy.
Doing a "view source" only shows the static source. To get the dynamic gnerated content from the avascript you need to dive into the Developers Tools/Inspector tab.
Might not the problem be that what you're hoping to see is html corresponding to the DOM that's been altered by dynamic JS? But does that actually exist, other than in the browser's internal data/control structures?
That is, I expect the browser reads the original html, parses it, builds a data structure that represents the DOM, then possibly modifies that (once or many times) via JS. It'd then render the page by working from the internal data structure, not any part of the character-based html.
Do you see what I mean?
Ah, forget that. I found from a test page here that right-clicking the "<html>" at the very top of the html tree structure at the bottom left of the tools window then using
Copy - Inner HTMLor Copy - Outer HTML
allowed me to paste the whole of the dynamic html into another file.
If instead of choosing "<html>" I chose eg "<head>" (or presumably another smaller part of the tree, then I got just its corresponding smaller amount of html. For example in a test page here whose original source has a series of test paragraphs starting with just
<p> Body text 1 enclosed by p-tags. </p>
some JS colours that. Clicking on the leading "<p>" then choosing the Copy - Inner HTML option, on the tools page when it's showing the coloured results, gives me
Body text 1 enclosed by p-tags.
but Copy - Outer HTML gives
<p style="background-color: red;"> Body text 1 enclosed by p-tags. </p>
-- Jeremy Nicoll - my opinions are my own _______________________________________________ users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org