THis sit =e is dead

2019-03-10 at 8:40 PM UTC

#21

Michael Myers victim of incest [divide your nonresilient tucker]

I started becoming active again, and I am greeted with this?! Thanks a lot!

2019-03-10 at 9:12 PM UTC

#22

-SpectraL coward [the spuriously bluish-lilac bushman]

Originally posted by Michael Myers I started becoming active again, and I am greeted with this?! Thanks a lot!

Sexually active?

2019-03-10 at 9:17 PM UTC

#23

Grylls Cum Looking Faggot [abrade this vocal tread-softly]

Originally posted by -SpectraL Sexually active?

do you want him to be?

2019-03-10 at 10:21 PM UTC

#24

Instigator Naturally Camouflaged [the staring tame crusher]

Originally posted by Grylls do you want him to be?

Yes

2019-03-10 at 10:23 PM UTC

#25

34nfi4w8g3wnfge4j93qrj309jg Houston [my metonymically tentacled thales]

Why is the reverse word of active just active with 'in' in front of it but inflammable means something can be flambled

2019-03-10 at 10:25 PM UTC

#26

Grylls Cum Looking Faggot [abrade this vocal tread-softly]

Originally posted by 34nfi4w8g3wnfge4j93qrj309jg Why is the reverse word of active just active with 'in' in front of it but inflammable means something can be flambled

yeah never got that inflammable one

2019-03-11 at 12:25 AM UTC

#27

Lanny Bird of Courage

Originally posted by gadzooks Hey Lanny, while you're present… quick question about the site…

Do you have a ballpark estimate about how much data the entire forum (posts, threads, member profiles, etc) takes up?

after gzip compression the (SQL) backups of the database are a little more than 800MB

The following users say it would be alright if the author of this post didn't die in a fire!

gadzooks

2019-03-11 at 12:28 AM UTC

#28

Grylls Cum Looking Faggot [abrade this vocal tread-softly]

Originally posted by Lanny after gzip compression the (SQL) backups of the database are a little more than 800MB

what if everyone posted all their saved cp

2019-03-11 at 12:37 AM UTC

#29

Soyboy V: A Cat-Girl/Boy Under Every Bed African Astronaut [my no haunted nonbeing]

Originally posted by Lanny after gzip compression the (SQL) backups of the database are a little more than 800MB

That's like 10x more than I would have guessed.

2019-03-11 at 12:57 AM UTC

#30

34nfi4w8g3wnfge4j93qrj309jg Houston [my metonymically tentacled thales]

I will write less words so it doesn't take up so much disc space

2019-03-11 at 2:23 AM UTC

#31

Soyboy V: A Cat-Girl/Boy Under Every Bed African Astronaut [my no haunted nonbeing]

Originally posted by 34nfi4w8g3wnfge4j93qrj309jg I will write less words so it doesn't take up so much disc space

War and Peace is a long ass book, a couple months worth of reading, and is 3.2 MB uncompressed.

http://www.gutenberg.org/ebooks/2600

2019-03-11 at 2:29 AM UTC

#32

gadzooks Dark Matter [keratinize my mild-tasting blossoming]

Originally posted by Lanny after gzip compression the (SQL) backups of the database are a little more than 800MB

Hey, do you mind if I do a full archive of the site?

You can name the throttle interval per request (right now I'm thinking either 1 thread a second - which would take about 10 hours if my math is right... although ideally 4 requests a second would be nice, but I'll let you call the shots on this. I don't want to DDoS the site or some shit lol).

I have a script set up and ready to go that sequentially reads each thread (from thread 1 to thread 35213)...

It's ready to parse all the data relevant to each post (timestamp, user who posted it, thread it was posted in, etc).

It's a one time procedure, then I can reconstruct it however necessary/desired in the future from the archived data and won't have to burden the servers again...

What say you, Lanny?

2019-03-11 at 2:48 AM UTC

#33

The Self Taught Man Black Hole

Originally posted by 34nfi4w8g3wnfge4j93qrj309jg Why is the reverse word of active just active with 'in' in front of it but inflammable means something can be flambled

I've always wondered this about "disgusting"

That would make something pleasant "gusting"

The following users say it would be alright if the author of this post didn't die in a fire!

gadzooks

2019-03-11 at 2:54 AM UTC

#34

gadzooks Dark Matter [keratinize my mild-tasting blossoming]

Originally posted by Jυicebox I've always wondered this about "disgusting"

That would make something pleasant "gusting"

Actually...

"gustatory" refers to the physiological sensation of "gustation".

When we taste things, we "gustate" those things.

So to say that something is "disgusting", we are saying it has taste, but it is not a desirable taste.

I lol'd when you said that tho, because this etymological connection has actually never occurred to me until you said that just now.

The following users say it would be alright if the author of this post didn't die in a fire!

Nil

2019-03-11 at 3:33 AM UTC

#35

vindicktive vinny Black Hole

Originally posted by Grylls yeah never got that inflammable one

some dude claimed his products were inflammable and then they caught fire.

luckily the dude had a friend who worked in the thesaury.

2019-03-11 at 3:40 AM UTC

#36

gadzooks Dark Matter [keratinize my mild-tasting blossoming]

Originally posted by gadzooks Hey, do you mind if I do a full archive of the site?

…

What say you, Lanny?

Nevermind, I'm just gonna do it.

It shouldn't tax the servers too much.

It's a grand total of 35204 requests, and I'm using a half second throttling interval.

It'll be done pretty quick.

2019-03-11 at 4:40 AM UTC

#37

Lanny Bird of Courage

Originally posted by MORALLY SUPERIOR BEING V: A Cat-Girl/Boy Under Every Bed That's like 10x more than I would have guessed.

Well there's are lot of posts, 600k, and another ~100k in PMs so that's like 1.5KB per post? I mean obviously not all that data is in the posts and PMs table, and it's after compression, but I wouldn't say it's outlandish. I imagine a lot of that space is in the thread flags table which is theoretically the Cartesian product of users and threads (although in actuality it's sparse because not everyone has a thread against every thread, just the ones they've viewed, which is still quite a lot).

Oh and there's houston data in there too which is probably significant.

Originally posted by gadzooks Hey, do you mind if I do a full archive of the site?

You can name the throttle interval per request (right now I'm thinking either 1 thread a second - which would take about 10 hours if my math is right… although ideally 4 requests a second would be nice, but I'll let you call the shots on this. I don't want to DDoS the site or some shit lol).

I have a script set up and ready to go that sequentially reads each thread (from thread 1 to thread 35213)…

It's ready to parse all the data relevant to each post (timestamp, user who posted it, thread it was posted in, etc).

It's a one time procedure, then I can reconstruct it however necessary/desired in the future from the archived data and won't have to burden the servers again…

What say you, Lanny?

Yeah, go for it. If you want to write a management command (django's mechanism for scripts that don't happen as part of the request/response cycle) to pull it straight from the DB and dump it into some CSV files or something I wouldn't mind running it and just sending you the output instead of you having to scrape everything. Obviously it would have to only output publicly available data but that's probably cleaner than parsing the markup and trying to extract content that way.

The following users say it would be alright if the author of this post didn't die in a fire!

2019-03-11 at 4:53 AM UTC

#38

-SpectraL coward [the spuriously bluish-lilac bushman]

Hey. We're trying to bitch here. Do you two mind getting a room? You're being way too helpful and cooperative for this here board. Perhaps you could insert swear words in between sentences or something? Maybe type in all CAPS?

2019-03-11 at 4:57 AM UTC

#39

gadzooks Dark Matter [keratinize my mild-tasting blossoming]

Originally posted by Lanny Yeah, go for it. If you want to write a management command (django's mechanism for scripts that don't happen as part of the request/response cycle) to pull it straight from the DB and dump it into some CSV files or something I wouldn't mind running it and just sending you the output instead of you having to scrape everything. Obviously it would have to only output publicly available data but that's probably cleaner than parsing the markup and trying to extract content that way.

I appreciate the offer.

I've got a simple script just downloading each HTML response for every iterative thread request (i.e: "thread 1".html, "thread 2.html", etc) up until i == 10,000 for now). That would be close to a third of the entire content. I started running it with a half second interval between requests.

I have some very simple exception handling (the bare essentials), and so my PyCharm console is showing me that it's currently at thread number 511xx. And it's also catching a very vaguely defined exception (on my part) and telling me that every few dozen threads there is a thread not saving... I imagine some threads have been deleted over the course of the years for one reason or another.

But it keeps on truckin'. It's looking like it should actually be totally done the entire site before the night ends, which is pretty good. I'm just brute-saving full HTML files for each thread, and I'll use Beautiful Soup to parse the files locally in some fashion and make some kind of database.

Panny was on my ass the other day about a word cloud I promised I'd make him, so it kinda lit the fire under my ass to just archive all the publicly viewable posts in one fell swoop so that I can analyze and process the data locally.

The only other thing I might want to run a separate script for is extracting member post info stats (basically just username, reg date, post count, and thanks given and thanks received) for each user.

I don't think it will necessitate downloading entire posts all over again. I'll optimize it as best I can.

2019-03-11 at 4:59 AM UTC

#40

-SpectraL coward [the spuriously bluish-lilac bushman]

ie:

Yeah, go for it, you fucker. If you want to write a management command(django's mechanism for scripts that don't happen as part of the request/response cycle), but I'll bet you're too dumb for it, to pull it straight from the DB and dump that god damned garbage into some CSV files or something I wouldn't mind running it and just sending your idiotic ass the output instead of you having to scrape everything using that shovel nose of yours. Obviously it would have to only output publicly available data for you to piss all over, but that's probably cleaner than fucking parsing the markup and trying to extract the son-of-a-bitch that way.

User Controls

Navigation