Originally posted by SBTlauien
So its called parsing?
Strictly speaking parsing is the process of transforming a token sequence into a tree structure. Often we include the lex phase (turning a byte or symbol stream into a token stream) in the parse phase. Parsing is one thing the browser has to do in order to handle an HTML form, but there's more as well (it has to render, handle input, etc.) and there's no real name for this whole process other than like "browser behavior" which covers everything a browser does.
I'm just looking for a library that can do this for me. Basically take HTML and determine the exact HTTP request.
This, unfortunately, isn't as straight forward as it seems like it should be. Browsers are really complicated pieces of software, the more closely you want to imitate their behavior the slower and more complicated your solution has to be. If you're dealing with simple HTML form submission (like you see here, for example) then you can use something like mechanize or twill which implement a pseudo-browser with no render or scripting support. Essentially they wrap an HTTP library and an HTML parsing library and offer a little magic like cookie management and form interaction on top.
Many sites these days will use some kind of scripting to handle form submissions. Whether or not this is a good idea is a somewhat controversial topic among web developers but it's common enough. An example might be old vB where when you submitted the login form some javascript would MD5 hash your password and send that instead of your plaintext password. That's a stupid idea for a number of reasons but it meant that if you were writing scripts to interact with vB you had to either use something like phantom to get JS execution working or do the MD5 hashing by hand.
Phantom is pretty high fidelity, it's what's called a "headless browser", it's basically like starting up chrome only nothing gets rendered to the screen, you can save cycles on the geometry engine and you don't need a GPU to make it go reasonably fast. It's slower and bigger and frankly a pain in the ass dependency to manage though.
The most realistic simulation of a browser you'll get it selenium which simulates user interaction with a browser. It's the heaviest of all, requiring a library, a driver, and an actual browser. But it's nearly indistinguishable from what happens when you're sitting in front of a browser clicking shit.
So the moral of the story is you need to figure out the requirements of submitting your web form. Is there CSRF protection? Is JS involved? Is there UA detection? You'll have to decide on tooling based on how complicated the form you're trying to automate the submission of is.