User Controls

What is it called when

  1. #1
    SBTlauien African Astronaut
    What is it called when code goes through html and sets up what a request to a server should be?

    For instance, if I click a Submit button that sends a POST request using several text feilds as parameters in the request, what is the type of code in a browser that determines the full request, called?

    Is it parsing?
  2. #2
    Not sure what you mean. POST can be called via the built in HTML method or via Javascript.
  3. #3
    SBTlauien African Astronaut
    So let's say a browser sends a GET request(we'll say / for simplicity), it then receives a HTML document as a response. Within that document is a form that, when submitted, will send a POST request with multiple parameters that are either hidden or in textfeilds(all in the form). How does the browser take the raw HTML and know how to form the correct POST request?
  4. #4
    Well a HTML get request results in a HTML document. That's basically a text document.

    The HTML document may have more files, like images, JS, stylesheets, etc that may need to be gotten.

    So the first thing the browser does is get the HTML document, then parse the document to generate the DOM.

    The POST would be a method of the form tag.

    For instance here's a page where you can switch from post to get.
    https://www.w3schools.com/tags/tryit.asp?filename=tryhtml_form_method
  5. #5
    SBTlauien African Astronaut
    So its called parsing?

    I'm just looking for a library that can do this for me. Basically take HTML and determine the exact HTTP request.
  6. #6
    SBTlauien African Astronaut
    In other words, how can I go through html/javascript and determine what the exact request(with all parameters) would be for submitting different forms?
  7. #7
    Sophie Pedophile Tech Support
    Originally posted by SBTlauien In other words, how can I go through html/javascript and determine what the exact request(with all parameters) would be for submitting different forms?

    Open up the developer tools in your browser and go to the "network" tab. Send the request you want to automate manually first. See which elements are involved, select on those elements in your script.
  8. #8
    Lanny Bird of Courage
    Originally posted by SBTlauien So its called parsing?

    Strictly speaking parsing is the process of transforming a token sequence into a tree structure. Often we include the lex phase (turning a byte or symbol stream into a token stream) in the parse phase. Parsing is one thing the browser has to do in order to handle an HTML form, but there's more as well (it has to render, handle input, etc.) and there's no real name for this whole process other than like "browser behavior" which covers everything a browser does.

    I'm just looking for a library that can do this for me. Basically take HTML and determine the exact HTTP request.

    This, unfortunately, isn't as straight forward as it seems like it should be. Browsers are really complicated pieces of software, the more closely you want to imitate their behavior the slower and more complicated your solution has to be. If you're dealing with simple HTML form submission (like you see here, for example) then you can use something like mechanize or twill which implement a pseudo-browser with no render or scripting support. Essentially they wrap an HTTP library and an HTML parsing library and offer a little magic like cookie management and form interaction on top.

    Many sites these days will use some kind of scripting to handle form submissions. Whether or not this is a good idea is a somewhat controversial topic among web developers but it's common enough. An example might be old vB where when you submitted the login form some javascript would MD5 hash your password and send that instead of your plaintext password. That's a stupid idea for a number of reasons but it meant that if you were writing scripts to interact with vB you had to either use something like phantom to get JS execution working or do the MD5 hashing by hand.

    Phantom is pretty high fidelity, it's what's called a "headless browser", it's basically like starting up chrome only nothing gets rendered to the screen, you can save cycles on the geometry engine and you don't need a GPU to make it go reasonably fast. It's slower and bigger and frankly a pain in the ass dependency to manage though.

    The most realistic simulation of a browser you'll get it selenium which simulates user interaction with a browser. It's the heaviest of all, requiring a library, a driver, and an actual browser. But it's nearly indistinguishable from what happens when you're sitting in front of a browser clicking shit.

    So the moral of the story is you need to figure out the requirements of submitting your web form. Is there CSRF protection? Is JS involved? Is there UA detection? You'll have to decide on tooling based on how complicated the form you're trying to automate the submission of is.
Jump to Top