Creating a natural language generation news bot.... erm

  1. #1
    filtration African Astronaut
    So, I'm currently scraping a bunch of news websites, and up to now I have 100,000 articles. What are we looking at to get a semi-decent dataset for a NLG story? Thousands, Millions, hundreds of millions of articles?

    I'll release the dataset once I've scraped a dataset worth while, I'll share the byte pair encoded file too.
  2. #2
    Sophie Pedophile Tech Support
    The more the better obviously but you should scale to the infrastructure you have available. Which language are you employing for this project? Python?
