User Controls

Do you suck at regex?

  1. #1
    Sophie Pedophile Tech Support
    Because i do. I need to grab some data out of a JSON file, so i got the file format going for me at least. Anyway, i need an address formatted in a peculiar way. Opts:map[Addr:0.tcp.domain.com:15273] i guess i could do with just the domain and port though. Oh and in general it turns up twice in the JSON file i am reading in, the second time it looks like this-> URL:tcp://0.tcp.domain.com:17380.

    I think i could figure the regex out for the second one on my own, and i will if i must, but if you have a regex in mind right off the bat, i'd appreciate it.
  2. #2
    -SpectraL coward [the spuriously bluish-lilac bushman]
    https://stackoverflow.com/questions/106179/regular-expression-to-match-dns-hostname-or-ip-address
  3. #3
    Sophie Pedophile Tech Support
    Originally posted by -SpectraL https://stackoverflow.com/questions/106179/regular-expression-to-match-dns-hostname-or-ip-address

    Wrong. I need the port too, this is essential.
  4. #4
    -SpectraL coward [the spuriously bluish-lilac bushman]
    https://stackoverflow.com/questions/19553003/need-a-regular-expression-to-validate-a-hostname-and-port-to-be-used-with-tcpcli
  5. #5
    -SpectraL coward [the spuriously bluish-lilac bushman]
    ^(http|https):\/\/(([a-z0-9]|[a-z0-9][a-z0-9\-]*[a-z0-9])\.)*([a-z0-9]|[a-z0-9][a-z0-9\-]*[a-z0-9])(:[0-9]+)?$
  6. #6
    esbity African Astronaut
    https://regexr.com/
  7. #7
    Lanny Bird of Courage
    What are you trying to match? The domain and port? Do you need the protocol as well? Is the for a specific domain or all domains that look like this?

    Originally posted by -SpectraL ^(http|https):\/\/(([a-z0-9]|[a-z0-9][a-z0-9\-]*[a-z0-9])\.)*([a-z0-9]|[a-z0-9][a-z0-9\-]*[a-z0-9])(:[0-9]+)?$

    Wrong. Like quite wrong. Only supports http(s) protocol specification, doesn't match if there's any capitalization and doesn't match if there are any unicode codepoints outside of the ascii range.

    Stop copy pasting shit from the internet, for your own damn good. It may surprise you to learn that sometimes the code you're copying isn't very good.
  8. #8
    Sophie Pedophile Tech Support
    Originally posted by Lanny
    What are you trying to match? The domain and port? Do you need the protocol as well? Is the for a specific domain or all domains that look like this?



    Wrong. Like quite wrong. Only supports http(s) protocol specification, doesn't match if there's any capitalization and doesn't match if there are any unicode codepoints outside of the ascii range.

    Stop copy pasting shit from the internet, for your own damn good. It may surprise you to learn that sometimes the code you're copying isn't very good.

    So here's the deal, the domain is static but the port is subject to change, i think i could do it without the protocol, but i don't remember exactly. Basically i am sorting out these addresses in order to pass them to a function in my python script that i use to connect to them.

    I am guessing this -> tcp://0.tcp.domain.com:17380, would be ideal for my situation. Also, do you think i should just read in the file as JSON then regex my way through it to get to the bit i want? Since python treats JSON files as a dictionary IIRC and regex sorts out strings, i am not sure whether it would jive that way.
  9. #9
    Lanny Bird of Courage
    If it's JSON then it's usually best to treat it like JSON. Unless you can't fit it into memory or something. You can still part large JSON files but it gets to be more complicated, at that point possibly worth considering the kludgier "treat it as a string and use regex" approach".

    You're right that you can't match a regular expression against a dictionary, only strings, but what you probably want to do is iterate over the keys of the JSON where you know this address might exist and match each of those values. E.g. if you JSON looks like:


    {
    "items": [
    {
    "addr": "Opts:map[Addr:0.tcp.domain.com:15273]",
    "foo": 42
    },
    ...
    ]
    }


    Then you would want to do something like:


    ports = []

    for item in loaded_json['items']:
    match = re.match(r'(?:tcp://)?0\.tcp\.domain\.com:(\d+)', item['addr'])
    if match:
    ports.append(match.groups(1))


    The regular expression there assumes the domain will always be "0.tcp.domain.com", and the protocol will either be "tcp" or omitted. If that's not a reasonable assumption it can be generalized.
    The following users say it would be alright if the author of this post didn't die in a fire!
  10. #10
    esbity African Astronaut
    Originally posted by Sophie So here's the deal, the domain is static but the port is subject to change, i think i could do it without the protocol, but i don't remember exactly. Basically i am sorting out these addresses in order to pass them to a function in my python script that i use to connect to them.

    I am guessing this -> tcp://0.tcp.domain.com:17380, would be ideal for my situation. Also, do you think i should just read in the file as JSON then regex my way through it to get to the bit i want? Since python treats JSON files as a dictionary IIRC and regex sorts out strings, i am not sure whether it would jive that way.

    How often does the port change?

    What type of server is this?
  11. #11
    Sophie Pedophile Tech Support
    Originally posted by Lanny
    If it's JSON then it's usually best to treat it like JSON. Unless you can't fit it into memory or something. You can still part large JSON files but it gets to be more complicated, at that point possibly worth considering the kludgier "treat it as a string and use regex" approach".

    You're right that you can't match a regular expression against a dictionary, only strings, but what you probably want to do is iterate over the keys of the JSON where you know this address might exist and match each of those values. E.g. if you JSON looks like:


    {
    "items": [
    {
    "addr": "Opts:map[Addr:0.tcp.domain.com:15273]",
    "foo": 42
    },
    ...
    ]
    }


    Then you would want to do something like:


    ports = []

    for item in loaded_json['items']:
    match = re.match(r'(?:tcp://)?0\.tcp\.domain\.com:(\d+)', item['addr'])
    if match:
    ports.append(match.groups(1))


    The regular expression there assumes the domain will always be "0.tcp.domain.com", and the protocol will either be "tcp" or omitted. If that's not a reasonable assumption it can be generalized.

    Greatly appreciated as always, if i come across anything unexpected i'll let you know.

    Originally posted by esbity How often does the port change?

    What type of server is this?

    It depends on operational circumstances. And it's a server that handles raw TCP traffic.
  12. #12
    esbity African Astronaut
    Originally posted by Sophie Greatly appreciated as always, if i come across anything unexpected i'll let you know.



    It depends on operational circumstances. And it's a server that handles raw TCP traffic.

    But what kind of server? Why is the port constantly changing?
Jump to Top