User Controls
THis sit =e is dead
-
2019-03-10 at 8:40 PM UTCI started becoming active again, and I am greeted with this?! Thanks a lot!
-
2019-03-10 at 9:12 PM UTC
-
2019-03-10 at 9:17 PM UTC
-
2019-03-10 at 10:21 PM UTC
-
2019-03-10 at 10:23 PM UTCWhy is the reverse word of active just active with 'in' in front of it but inflammable means something can be flambled
-
2019-03-10 at 10:25 PM UTC
-
2019-03-11 at 12:25 AM UTC
-
2019-03-11 at 12:28 AM UTC
-
2019-03-11 at 12:37 AM UTC
-
2019-03-11 at 12:57 AM UTCI will write less words so it doesn't take up so much disc space
-
2019-03-11 at 2:23 AM UTC
Originally posted by 34nfi4w8g3wnfge4j93qrj309jg I will write less words so it doesn't take up so much disc space
War and Peace is a long ass book, a couple months worth of reading, and is 3.2 MB uncompressed.
http://www.gutenberg.org/ebooks/2600 -
2019-03-11 at 2:29 AM UTC
Originally posted by Lanny after gzip compression the (SQL) backups of the database are a little more than 800MB
Hey, do you mind if I do a full archive of the site?
You can name the throttle interval per request (right now I'm thinking either 1 thread a second - which would take about 10 hours if my math is right... although ideally 4 requests a second would be nice, but I'll let you call the shots on this. I don't want to DDoS the site or some shit lol).
I have a script set up and ready to go that sequentially reads each thread (from thread 1 to thread 35213)...
It's ready to parse all the data relevant to each post (timestamp, user who posted it, thread it was posted in, etc).
It's a one time procedure, then I can reconstruct it however necessary/desired in the future from the archived data and won't have to burden the servers again...
What say you, Lanny? -
2019-03-11 at 2:48 AM UTC
-
2019-03-11 at 2:54 AM UTC
Originally posted by JĎ…icebox I've always wondered this about "disgusting"
That would make something pleasant "gusting"
Actually...
"gustatory" refers to the physiological sensation of "gustation".
When we taste things, we "gustate" those things.
So to say that something is "disgusting", we are saying it has taste, but it is not a desirable taste.
I lol'd when you said that tho, because this etymological connection has actually never occurred to me until you said that just now. -
2019-03-11 at 3:33 AM UTC
-
2019-03-11 at 3:40 AM UTC
-
2019-03-11 at 4:40 AM UTC
Originally posted by MORALLY SUPERIOR BEING V: A Cat-Girl/Boy Under Every Bed That's like 10x more than I would have guessed.
Well there's are lot of posts, 600k, and another ~100k in PMs so that's like 1.5KB per post? I mean obviously not all that data is in the posts and PMs table, and it's after compression, but I wouldn't say it's outlandish. I imagine a lot of that space is in the thread flags table which is theoretically the Cartesian product of users and threads (although in actuality it's sparse because not everyone has a thread against every thread, just the ones they've viewed, which is still quite a lot).
Oh and there's houston data in there too which is probably significant.
Originally posted by gadzooks Hey, do you mind if I do a full archive of the site?
You can name the throttle interval per request (right now I'm thinking either 1 thread a second - which would take about 10 hours if my math is right… although ideally 4 requests a second would be nice, but I'll let you call the shots on this. I don't want to DDoS the site or some shit lol).
I have a script set up and ready to go that sequentially reads each thread (from thread 1 to thread 35213)…
It's ready to parse all the data relevant to each post (timestamp, user who posted it, thread it was posted in, etc).
It's a one time procedure, then I can reconstruct it however necessary/desired in the future from the archived data and won't have to burden the servers again…
What say you, Lanny?
Yeah, go for it. If you want to write a management command (django's mechanism for scripts that don't happen as part of the request/response cycle) to pull it straight from the DB and dump it into some CSV files or something I wouldn't mind running it and just sending you the output instead of you having to scrape everything. Obviously it would have to only output publicly available data but that's probably cleaner than parsing the markup and trying to extract content that way. -
2019-03-11 at 4:53 AM UTCHey. We're trying to bitch here. Do you two mind getting a room? You're being way too helpful and cooperative for this here board. Perhaps you could insert swear words in between sentences or something? Maybe type in all CAPS?
-
2019-03-11 at 4:57 AM UTC
Originally posted by Lanny Yeah, go for it. If you want to write a management command (django's mechanism for scripts that don't happen as part of the request/response cycle) to pull it straight from the DB and dump it into some CSV files or something I wouldn't mind running it and just sending you the output instead of you having to scrape everything. Obviously it would have to only output publicly available data but that's probably cleaner than parsing the markup and trying to extract content that way.
I appreciate the offer.
I've got a simple script just downloading each HTML response for every iterative thread request (i.e: "thread 1".html, "thread 2.html", etc) up until i == 10,000 for now). That would be close to a third of the entire content. I started running it with a half second interval between requests.
I have some very simple exception handling (the bare essentials), and so my PyCharm console is showing me that it's currently at thread number 511xx. And it's also catching a very vaguely defined exception (on my part) and telling me that every few dozen threads there is a thread not saving... I imagine some threads have been deleted over the course of the years for one reason or another.
But it keeps on truckin'. It's looking like it should actually be totally done the entire site before the night ends, which is pretty good. I'm just brute-saving full HTML files for each thread, and I'll use Beautiful Soup to parse the files locally in some fashion and make some kind of database.
Panny was on my ass the other day about a word cloud I promised I'd make him, so it kinda lit the fire under my ass to just archive all the publicly viewable posts in one fell swoop so that I can analyze and process the data locally.
The only other thing I might want to run a separate script for is extracting member post info stats (basically just username, reg date, post count, and thanks given and thanks received) for each user.
I don't think it will necessitate downloading entire posts all over again. I'll optimize it as best I can. -
2019-03-11 at 4:59 AM UTCie:
Yeah, go for it, you fucker. If you want to write a management command(django's mechanism for scripts that don't happen as part of the request/response cycle), but I'll bet you're too dumb for it, to pull it straight from the DB and dump that god damned garbage into some CSV files or something I wouldn't mind running it and just sending your idiotic ass the output instead of you having to scrape everything using that shovel nose of yours. Obviously it would have to only output publicly available data for you to piss all over, but that's probably cleaner than fucking parsing the markup and trying to extract the son-of-a-bitch that way.