2020-09-14 at 2:21 PM UTC
So, I'm currently scraping a bunch of news websites, and up to now I have 100,000 articles. What are we looking at to get a semi-decent dataset for a NLG story? Thousands, Millions, hundreds of millions of articles?
I'll release the dataset once I've scraped a dataset worth while, I'll share the byte pair encoded file too.
2020-09-14 at 5:58 PM UTC
The more the better obviously but you should scale to the infrastructure you have available. Which language are you employing for this project? Python?