# Pagination ### Paged operations Some GitHub API operations return their results one page at a time. For instance, there are many thousands of [gists](https://docs.github.com/github/writing-on-github/creating-gists), but if we call `list_public` we only see the first 30: ``` python api = GhApi() ``` ``` python gists = api.gists.list_public() len(gists) ``` 30 That’s because this operation takes two optional parameters, `per_page`, and `page`: ``` python api.gists.list_public ``` [gists.list_public](https://docs.github.com/v3/gists/#list-public-gists)(since, per_page, page): *List public gists* This is a common pattern for `list_*` operations in the GitHub API. One way to get more results is to increase `per_page`: ``` python len(api.gists.list_public(per_page=100)) ``` 100 However, `per_page` has a maximum of `100`, so if you want more, you’ll have to pass `page=` to get pages beyond the first. An easy way to iterate through all pages is to use [`paged`](https://ghapi.fast.ai/page.html#paged). [`paged`](https://ghapi.fast.ai/page.html#paged) returns a generator ------------------------------------------------------------------------ source ### paged ``` python def paged( oper, args:VAR_POSITIONAL, per_page:int=30, max_pages:int=9999, kwargs:VAR_KEYWORD ): ``` *Convert operation `oper(*args,**kwargs)` into an iterator* We’ll demonstrate this using the `repos.list_for_org` method: ``` python api.repos.list_for_org ``` [repos.list_for_org](https://docs.github.com/v3/repos/#list-organization-repositories)(org, type, sort, direction, per_page, page): *List organization repositories* ``` python repos = api.repos.list_for_org(org='fastai') len(repos),repos[0].name ``` (30, 'docs') To convert this operation into a Python iterator, pass the operation itself, along with any arguments (either keyword or positional) to [`paged`](https://ghapi.fast.ai/page.html#paged). Note how the function and arguments are passed separately: ``` python repos = paged(api.repos.list_for_org, org='fastai') ``` Note that the object returned from [`paged`](https://ghapi.fast.ai/page.html#paged) is a generator. You can iterate through this generator `repos` in the normal way: ``` python for page in repos: print(len(page), page[0].name) ``` 30 docs 30 fastscript 25 wireguard-fast ### Link header (RFC 5988) GitHub tells us how many pages are available using the [link header](https://tools.ietf.org/html/rfc5988). Unfortunately the pypi [LinkHeader](https://pypi.org/project/LinkHeader/) library appears to no longer be maintained, so we’ve put a refactored version of it here. ------------------------------------------------------------------------ source ### parse_link_hdr ``` python def parse_link_hdr( header ): ``` *Parse an RFC 5988 link header, returning a `dict` from rels to a `tuple` of URL and attrs `dict`* Here’s an example of a link header with just one link: ``` python parse_link_hdr('; rel="foo bar"; type=text/html') ``` {'foo bar': ('http://example.com', {'type': 'text/html'})} ``` python links = parse_link_hdr('; rel="foo bar"; type=text/html') link = links['foo bar'] test_eq(link[0], 'http://example.com') test_eq(link[1]['type'], 'text/html') ``` Let’s test it on the headers we received on our last call to GitHub. You can access the last call’s headers in \`recv_hdrs’: ``` python api.recv_hdrs['Link'] ``` '; rel="prev", ; rel="last", ; rel="first"' Here’s what happens when we parse that: ``` python parse_link_hdr(api.recv_hdrs['Link']) ``` {'prev': ('https://api.github.com/organizations/20547620/repos?per_page=30&page=4', {}), 'last': ('https://api.github.com/organizations/20547620/repos?per_page=30&page=4', {}), 'first': ('https://api.github.com/organizations/20547620/repos?per_page=30&page=1', {})} ### Getting pages in parallel Rather than requesting each page one at a time, we can save some time by getting all the pages we need in parallel. ------------------------------------------------------------------------ source ### GhApi.last_page ``` python def last_page( ): ``` *Parse RFC 5988 link header from most recent operation, and extract the last page* To help us know the number of pages needed, we can use `last_page`, which uses the link header we just looked at to grab the last page from GitHub. We will need multiple pages to get all the repos in the `github` organization, even if we get 100 at a time: ``` python api.repos.list_for_org('github', per_page=100) api.last_page() ``` 4 ------------------------------------------------------------------------ source ### pages ``` python def pages( oper, n_pages, args:VAR_POSITIONAL, n_workers:NoneType=None, per_page:int=100, kwargs:VAR_KEYWORD ): ``` *Get `n_pages` pages from `oper(*args,**kwargs)`* [`pages`](https://ghapi.fast.ai/page.html#pages) by default passes `per_page=100` to the operation. Let’s look at some examples. To get all the pages for the repos in the `github` organization in parallel, we can use this: ``` python gh_repos = pages(api.repos.list_for_org, api.last_page(), 'github').concat() len(gh_repos) ``` 367 If you already know ahead of time the number of pages required, there’s no need to call `last_page`. For instance, the GitHub docs specify that we can get at most 3000 gists: ``` python gists = pages(api.gists.list_public, 30).concat() len(gists) ``` 3000 GitHub ignores the `per_page` parameter for some API calls, such as listing public events, which it limits to 8 pages of 30 items per page. To retrieve all pages in these cases, you need to explicitly pass the lower per page limit: ``` python api.activity.list_public_events() api.last_page() ``` 8 ``` python evts = pages(api.activity.list_public_events, api.last_page(), per_page=30).concat() len(evts) ``` 232