User Controls

  1. 1
  2. 2
  3. 3
  4. ...
  5. 161
  6. 162
  7. 163
  8. 164
  9. 165
  10. 166
  11. ...
  12. 638
  13. 639
  14. 640
  15. 641

Posts by gadzooks

  1. gadzooks Dark Matter [keratinize my mild-tasting blossoming]
    Originally posted by WellHung Is Mozilla Firefox a browser?

    Yes.
  2. gadzooks Dark Matter [keratinize my mild-tasting blossoming]
    And a note about the actual artistic/visual representation...

    Full disclosure: I am using already existing online tools to generate the imagery.

    The bulk of the work, though, is in crawling, archiving, and ultimately parsing and collating all the relevant data.

    The next most time-consuming step is cleaning/pre-processing the text data for analysis.

    Running a word frequency analysis on a large chunk of text really only involves a couple lines of code for most high-level languages (Python, JavaScript, etc).

    Generating the visual, while also an interesting topic, is something I just "outsource".

    But, if I had to write my own, I'd probably leverage a pre-existing JavaScript visualization library (like D3).

    And then, it would simply be a matter of importing the cleaned/pre-processed data, and finding the most frequently occurring words, and then adjusting the font size of each word as a function of it's frequency.
  3. gadzooks Dark Matter [keratinize my mild-tasting blossoming]
    Months ago, I got super into doing all kinds of data analysis on NiS thread/post content for practice/self-education, as well as for sheer lulz.

    https://niggasin.space/thread/31288
    https://niggasin.space/thread/31496

    One of those areas attracted heightened interest, especially by a couple of homies in particular, mmQ and Grimace.

    (By the way, Grimace, I realize for you the priority is the Totse dump, and, I am working on that as well, but parsing HTML tables is brutal, and, for ethical reasons, I don't want to simply throw hundreds of threads back onto the Internet directly like WaybackMachine does, so I'm pretty much stuck parsing the files, and Totse had two different HTML formats... Zoklet has only the one. I'm working on it all in tangent, I promise).

    Now, back to word clouds...

    The first one I did was for NiS's top candidate for most controversial figure. My motivation to choose him at the time was not out of some form of admiration of any kind, but, rather, because I figured his word cloud would likely be interesting and/or entertaining.

    Btw, for anyone who does not know exactly what a word cloud is, it takes a large portion of text and statistically determines the N most frequently occurring words within it, and then results in a pretty and colorful image showing all the top words, but with the size of the word correlating with the frequency of that particular word.

    For example, see the original infinityshock word cloud:



    But now, a bit more about the process...

    First off, you might be thinking... Won't words like "the", "a", "to", and so on, always be the top used words?

    Yes, they are the most frequently used words of course, but any kind of linguistic analysis of a large corpus (body) of text, involves a few steps to clean the data up a bit. Those super common words mentioned above are referred to as stop words. There are a few ways to remove them programmatically - I believe I used a publicly available list online to filter the large body of text for the above word cloud, but many NLP (Natural Language Processing) libraries, such as NLTK for Python (the one I typically use), have built in libraries that you just choose and declare when you're preparing the data.

    There are MANY other ways in which textual data can be prepared for analysis, but, for a word cloud, which is actually an incredibly simple analysis compared to other NLP use cases, it's literally just about counting how often words occur. Nothing all that fancy, really.

    But, cleaning and preparing the data is always an important step.

    Case in point:



    That's a word cloud generated (just now) from the exact same text data, but before doing any fancy pre-processing or filtering (other than stop words).

    Notice how "Bill" and "Krosby" are among his 20 most frequently used words? I think that, when I made the original, I simply manually added those two words to the stop word list because they came up so much (kind of a quick and dirty brute force method).

    (Apparently infinityshock references, or quotes, kr0z, with some regularity).

    OH, and that reminds me...

    Quotes...

    One reflection I had about my original word cloud (much later on) was that I did not filter out quotes... So, it is technically including words the target poster didn't actually use themselves. This skews the data.

    Right now, as we speak, I am running a python script on the data I have already archived to parse out quotes. I will elaborate on my specific method of doing so in a subsequent post in this thread.
  4. gadzooks Dark Matter [keratinize my mild-tasting blossoming]
    Originally posted by Jesus Christ

  5. gadzooks Dark Matter [keratinize my mild-tasting blossoming]
    Father's day is coming up...

    I like Amazon gift cards, or cash...

    Actually, cash is better.
  6. gadzooks Dark Matter [keratinize my mild-tasting blossoming]
    Originally posted by Jesus Christ you couldn't attract a tin can

    That's precisely why I had to settle for your mom.
  7. gadzooks Dark Matter [keratinize my mild-tasting blossoming]
    I can't believe I fucked your mom ~28 years ago.
  8. gadzooks Dark Matter [keratinize my mild-tasting blossoming]
    I don't care about being unbanned.

    I almost never go into TC as it is.

    I really do need to pass out.
  9. gadzooks Dark Matter [keratinize my mild-tasting blossoming]
    lmao i think i got banned.

    Prolly for the best.

    I need to finish off this drink and pass the fuck out.
  10. gadzooks Dark Matter [keratinize my mild-tasting blossoming]
    tfw parsing HTML tables for hours...

    FML.

  11. gadzooks Dark Matter [keratinize my mild-tasting blossoming]
    I've been spending all night parsing my archived Totse threads...

    Largely because I promised panny...

    Nigga AWOL at the moment, but I'm carrying on anyway.
  12. gadzooks Dark Matter [keratinize my mild-tasting blossoming]
    Topic: My Weekend, And Yours.

    panthrax
    Moderator

    posted 09-19-2005 14:31

    I don't remember a god damn thing about my weekend. As I stare onto the computer desk, the desk this very computer sits on, I admire the ovals, circles, and multi-colored entities we call "pills".

    Soma 350mg x 5 (chewed)
    Klonopin 1mg x 3
    Xanax .5mg x 2

    That was my day by day weekend. And I don't remember any of it.

    How was your weekend, if you remember it?

    [This message has been edited by panthrax (edited 09-19-2005).]
  13. gadzooks Dark Matter [keratinize my mild-tasting blossoming]
    Warren or Jimmy?
  14. gadzooks Dark Matter [keratinize my mild-tasting blossoming]
    She told me she was single.

    Lying fucking slut.
  15. gadzooks Dark Matter [keratinize my mild-tasting blossoming]
    *googles it*

    Yep, it's Jenny.
  16. gadzooks Dark Matter [keratinize my mild-tasting blossoming]
    Originally posted by Jiggaboo_Johnson 867 5309

    Is that Jenny?
  17. gadzooks Dark Matter [keratinize my mild-tasting blossoming]
    Wait, how did this happen in TC?

    Does TC mean something different now?
  18. gadzooks Dark Matter [keratinize my mild-tasting blossoming]
    Originally posted by A College Professor Some wild homo came in out of no where and maced Krauz

    Like... e-maced?
  19. gadzooks Dark Matter [keratinize my mild-tasting blossoming]
    ^ I finally managed to upload it as a gif (took longer than editing the video in the first place).
  20. gadzooks Dark Matter [keratinize my mild-tasting blossoming]
  1. 1
  2. 2
  3. 3
  4. ...
  5. 161
  6. 162
  7. 163
  8. 164
  9. 165
  10. 166
  11. ...
  12. 638
  13. 639
  14. 640
  15. 641
Jump to Top