User Controls
How do i reformat a row from a CSV file to JSON?
-
2017-08-25 at 9:59 PM UTCSo i ahve a big CSV file that is formatted like this.
CVE-1999-0026,Entry,"root privileges via buffer overflow in pset command on SGI IRIX systems.","CERT:CA-97.21.sgi_buffer_overflow | AUSCERT:AA-97.20.IRIX.pset.buffer.overflow.vul
I want to use Python's JSON module to make it look more like this:
CVE-1999-0026,
Entry,"root privileges via buffer overflow in pset command on SGI IRIX systems.",
"CERT:CA-97.21.sgi_buffer_overflow | AUSCERT:AA-97.20.IRIX.pset.buffer.overflow.vul
If i can do that without the JSON module that'd be ok too. As usual we are talking Python here. -
2017-08-25 at 10:07 PM UTCHave you tried turning it off and turning it back on again?
-
2017-08-25 at 10:08 PM UTC
-
2017-08-25 at 10:10 PM UTC
-
2017-08-25 at 10:12 PM UTCDon't encourage him
-
2017-08-25 at 10:12 PM UTCpythons belong in the jungle, not in captivity
-
2017-08-25 at 10:39 PM UTC^泥哥飞鸽
-
2017-08-25 at 11:24 PM UTC
Originally posted by Sophie So i ahve a big CSV file that is formatted like this.
CVE-1999-0026,Entry,"root privileges via buffer overflow in pset command on SGI IRIX systems.","CERT:CA-97.21.sgi_buffer_overflow | AUSCERT:AA-97.20.IRIX.pset.buffer.overflow.vul
I want to use Python's JSON module to make it look more like this:
CVE-1999-0026,
Entry,"root privileges via buffer overflow in pset command on SGI IRIX systems.",
"CERT:CA-97.21.sgi_buffer_overflow | AUSCERT:AA-97.20.IRIX.pset.buffer.overflow.vul
If i can do that without the JSON module that'd be ok too. As usual we are talking Python here.
The first thing you want to think about is the most natural way to represent these entries in JSON. Rather than representing them as a sequence it seems like key/values would make more sense. Usually CSV files come with row names at the top. Assuming that's the case you could do something like this:
import csv
import json
import StringIO
sample_csv = """cve_id,type,description,whatever
CVE-1999-0026,Entry,"root privileges via buffer overflow in pset command on SGI IRIX systems.","CERT:CA-97.21.sgi_buffer_overflow | AUSCERT:AA-97.20.IRIX.pset.buffer.overflow.vul"
CVE-1999-0027,Entry,"root privileges via buffer overflow in pset command on SGI IRIX systems.","CERT:CA-97.21.sgi_buffer_overflow | AUSCERT:AA-97.20.IRIX.pset.buffer.overflow.vul"
"""
if __name__ == '__main__':
pseudo_file = StringIO.StringIO(sample_csv)
csv_reader = csv.DictReader(pseudo_file)
rows = [row for row in csv_reader]
print json.dumps({'vulns': rows})
which produces this (after formatting):
{
"vulns": [
{
"whatever": "CERT:CA-97.21.sgi_buffer_overflow | AUSCERT:AA-97.20.IRIX.pset.buffer.overflow.vul",
"type": "Entry",
"description": "root privileges via buffer overflow in pset command on SGI IRIX systems.",
"cve_id": "CVE-1999-0026"
},
{
"whatever": "CERT:CA-97.21.sgi_buffer_overflow | AUSCERT:AA-97.20.IRIX.pset.buffer.overflow.vul",
"type": "Entry",
"description": "root privileges via buffer overflow in pset command on SGI IRIX systems.",
"cve_id": "CVE-1999-0027"
}
]
}
You can ignore the StringIO stuff, it's just a way to make a file-like object out of a string (csv.DictReader accepts a file-like object) but you'll have a real file.
Also worth pointing out that the `rows` list (a list of dictionaries built from the rows of the CSV file) gets wrapped in a dict before being passed to `json.dumps`. You don't _have_ to do this, json.dumps will accept a list, but strictly speaking plain arrays are not valid JSON, per the spec every piece of JSON must decode to exactly one object. That object can have keys which are arrays but the top level must be an object (map from keys to values, equivalent to python's "dictionary" type). -
2017-08-26 at 5:41 AM UTCThank you Lanny, when i return to coding in a bit i will try this out and report back with results.
-
2017-08-26 at 7:19 AM UTCI can give you the code I used to do the same in perl, not sure how helpful it'd be though.
it was just a matter of reading the csv into an array then using the json cpan library to convert the array to json -
2017-08-27 at 5:42 PM UTCAnyway, i want to search for specific CVEs in my csv file and print them as JSON can i just do something like this?
query = raw_input("wut vuln m8?: ")
with open('cve_mitre.csv', 'rb') as infile:
csv_reader = csv.DictReader(infile)
rows = [row for row in csv_reader]
for item in rows:
if query in item:
print json.dumps({'vulns': item})
Or can't i do "if this string is in this dictionary item print the dictionary item". Do i need to do some regex hocus pocus, pls no, i suck at regex. -
2017-08-27 at 9 PM UTCThere's no general way to ask "does my string occurs in this dictionary" because it's not really obvious what "occurs in" would mean. It's like asking "what's the sum of this number and this sting", the operation isn't defined over those types. When you write `if 'foo' in my_dict:` what you're asking is "is 'foo' one of the keys in my_dict`. You probably want to search within a specific field in each row (like if someone types in "apache" there's not much point in looking at the CVE column since the string "apache" is never going to appear there). If you really want to search every field you'd have to do something like this instead:
query = raw_input("wut vuln m8?: ")
with open('cve_mitre.csv', 'rb') as infile:
csv_reader = csv.DictReader(infile)
rows = [row for row in csv_reader]
for row in rows:
for col_name in row:
if query in row[col_name]:
print json.dumps(row)
Note what gets sent to json.dumps doesn't need to be wrapped in this case since it (the vulnerability row) is already a dictionary. -
2017-08-27 at 11:32 PM UTCThanks, but wouldn't that just print the row that had 'apache' in it? So the output would just say:
"description": "something something apache, something something vuln"
Because that doesn't seem particularly useful. -
2017-08-27 at 11:33 PM UTCOh wait a row is like the whole item right. If so, don't mind me.
-
2017-08-28 at 6:27 AM UTCyeah, `row` is the dict with a key for each column. `col_name` would be "description" and `row[col_name]` would be the actual string describing that vulnerability.
Oh also as an aside, you can replace
rows = [row for row in csv_reader]
for row in rows:
with
for row in csv_reader:
They do the same thing but the difference is the former will read the whole file into memory at once while the latter will read the file in chunks, which can be significant if you're working with large files. -
2017-08-30 at 2:22 AM UTCI think the whole thing is like 100mb. I'll try it out to check out the difference.