From IMDb you want to get the list of the Top 100 movies.
During my research I found this post
on SO. It suggests that one should download the official IMDb dump from here
. The Top 250 list is in the file
. However, this file doesn’t contain the IMDb IDs of the movies, so it’s good for nothing :(
There was only one solution left: let’s do some scraping. Here is the Python code that did the job for me. I didn’t use BeautifulSoup just plain ol’ regular expressions:
import requests import re top250_url = "http://akas.imdb.com/chart/top" def get_top250(): r = requests.get(top250_url) html = r.text.split("\n") result =  for line in html: line = line.rstrip("\n") m = re.search(r'data-titleid="tt(\d+?)">', line) if m: _id = m.group(1) result.append(_id) # return result
It returns the IMDb IDs of the Top 250 movies. Then, using the imdbpy package you can ask all the information about a movie, since you have the movie ID.
- IMDB -> JSON , if you want to work with the dump files