User Controls

How do I archive a website

  1. #1
    By any means, saving every page or mirroring it or whatever.

    Asking for myself.
  2. #2
    A College Professor victim of incest [your moreover breastless limestone]
    get a laser printer and print all the pages and map out all the links for each page and lay it out on the wall with branches and trees download all the sound effects and put the,m in casette tapes with walkman players for each tapes
    The following users say it would be alright if the author of this post didn't die in a fire!
  3. #3
    whoami Tuskegee Airman
    nigger
    The following users say it would be alright if the author of this post didn't die in a fire!
  4. #4
    gadzooks Dark Matter [keratinize my mild-tasting blossoming]
    Originally posted by whoami wget -mkEHpnp -e robots=off -U "Mozilla/5.0 (Windows NT x.y; rv:10.0) Gecko/20100101 Firefox/10.0" $url

    I love the succinctness of this line of code.

    No idea what like... any of those wget flags do, but the command gives off the impression of being legit.

    Originally posted by whoami write a custom scraper

    This is generally my approach.

    I mean, I have used HTTrack a fair bit for some static sites, but for anything with a more dynamic CMS style, like a forum (much like this one), for instance, I prefer a custom (python + beautiful soup / requests libraries) script for maximum control and customization.
  5. #5
    gadzooks Dark Matter [keratinize my mild-tasting blossoming]
    Of course, if they have a REST API interface, or even a Python API package for download, well, damn, that site's a sitting duck.
  6. #6
    gadzooks Dark Matter [keratinize my mild-tasting blossoming]
    I haven't had to use Selenium for more complex JS framework rendering (like React and so on), YET.

    Beautiful Soup has always been sufficient (thus far, at least).
  7. #7
    filtration African Astronaut
    This post has been edited by a bot I made to preserve my privacy.
  8. #8
    filtration African Astronaut
    This post has been edited by a bot I made to preserve my privacy.
  9. #9
    gadzooks Dark Matter [keratinize my mild-tasting blossoming]
    Originally posted by filtration deleted, because I am retarded.

    I was watching some of your attempts there.

    You put in a solid effort.
  10. #10
    Erekshun Naturally Camouflaged
    Why not just archive an archive?
  11. #11
    gadzooks Dark Matter [keratinize my mild-tasting blossoming]
    Originally posted by Erekshun Why not just archive an archive?

    Not quite sure I follow...

  12. #12
    Erekshun Naturally Camouflaged
    Is that a question?
  13. #13
    -SpectraL coward [the spuriously bluish-lilac bushman]
    https://bigdata-madesimple.com/top-50-open-source-web-crawlers-for-data-mining/
  14. #14
    gadzooks Dark Matter [keratinize my mild-tasting blossoming]
    Originally posted by Erekshun Is that a question?

    I guess I just mean... could you elaborate?

    Your phrase sounds like a tautology.

    What do you mean by "archive, the verb" and "archive, the noun"?
  15. #15
    Erekshun Naturally Camouflaged
    Originally posted by gadzooks I guess I just mean… could you elaborate?

    Your phrase sounds like a tautology.

    What do you mean by "archive, the verb" and "archive, the noun"?

    I don't have an answer. I was actually asking a question myself.
  16. #16
    ok, the thing is that i know nothing about computers. Can you send me like a program to download and run on it?
  17. #17
    gadzooks Dark Matter [keratinize my mild-tasting blossoming]
    Originally posted by DietPiano ok, the thing is that i know nothing about computers. Can you send me like a program to download and run on it?

    HTTrack.

    Before I even wrote a "print('Hello world!')" application, I was using HTTrack. No coding required, but it does have certain limitations regarding more contemporary web frameworks and technologies.

    What's the site you want to archive? Let's start with that.
    The following users say it would be alright if the author of this post didn't die in a fire!
  18. #18
    whoami Tuskegee Airman
    nigger
    The following users say it would be alright if the author of this post didn't die in a fire!
Jump to Top