Skip to main content

Command Palette

Search for a command to run...

Pagination Perfected: Hypermedia and Deletion-resilience for Data Efficiency

Updated
8 min read
Pagination Perfected: Hypermedia and Deletion-resilience for Data Efficiency

Introduction

APIs (Application Programming Interfaces) are not designed for user interface (UI) or user experience (UX). This is obviously because API is a backend architecture. For this reason, even paginated APIs have limitations when it comes to a seamless user experience.

However, HATEOAS (Hypermedia As The Engine Of Application State), promotes a more discoverable and interactive way of using RESTFUL services.

In this guide, you will learn how to paginate a dataset with hypermedia and implement a deletion-resilient strategy for accessing your dataset.

Prerequisites

To follow along with this guide, you’ll need:

  • Basic knowledge of working with APIs.

  • Good understanding of Python programming.

If you don’t already know Python programming, you can still implement the logic in your preferred language.

Understanding Hypermedia or HATEOAS

Hypermedia is a type of content or data format that includes hyperlinks or references to other resources, allowing for interaction with related information. When you visit a webpage, for example, you may see links to ‘explore’, ‘contact’, ‘tweet a thanks’, and so on. These links enable you to access resources within the website or an external source.

That is how hypermedia works, each response you receive contains additional information that allows you to take further actions.

HATEOAS, on the other hand, is a principle and architectural constraint of the RESTFUL API design. HATEOAS suggests that a RESTful API should include hypermedia links in its response to allow for better interactivity between the client and server. HATEOAS is a way of using hypermedia in RESTful APIs.

Think of hypermedia as a way of telling clients to click links or buttons to access further content when they make API requests.

HATEOAS is vital in API design and for good reasons:

  • It makes APIs self-discoverable. Clients don’t have to know every single endpoint in the API to access the information they need. Developers can easily understand and use it.

  • It allows for flexibility. When an endpoint changes in the server, clients can adapt by following the new links.

  • It leads to a more concise API documentation to understand how to use the API.

  • It simplifies the development process for clients by encouraging standard practices in API design.

How to Paginate a Dataset with Hypermedia Pagination

As an example, you will work with this CSV file to employ the hypermedia pagination strategy. Save the file as Popular_Baby_Names.csv. The CSV file contains the names of popular baby names. Here is the logic we’ll be considering:

  1. Implement a helper function to return the start and end index of data based on pagination parameters (page and page_size).

  2. Define a class that handles operations on the CSV file.

  3. Read the CSV file and store the dataset in a list of lists.

  4. Define a method to return the paginated results of the data according to the specified parameters.

  5. Define a method to return hypermedia data from the dataset.

The steps will become clearer as you carry them out. First, open the main file, you can call it hypermedia.py if you wish. Copy this code into it:

def index_range(page: int, page_size: int):

    start_index = (page - 1) * page_size
    end_index = start_index + page_size

    return (start_index, end_index)

This function returns the start and end index according to the pagination parameters. For example, if you want to get 15 popular baby names starting from page 3, the function returns the indexes of where your search starts and ends. Let’s test out the function:

res = index_range(page=3, page_size=15)
print(res)

Result:

(30, 45)

This means your required dataset starts at index 30 and ends at 45.

Next, let’s define a class to read and store the dataset in a list of lists so you can easily work with it.

class Server:
    """Server class to paginate a database of popular baby names.
    """
    DATA_FILE = "Popular_Baby_Names.csv"

    def __init__(self):
        self.__dataset = None

    def dataset(self):
        """Cached dataset"""
        if self.__dataset is None:
            with open(self.DATA_FILE) as f:
                reader = csv.reader(f)
                dataset = [row for row in reader]

            self.__dataset = dataset[1:]

        return self.__dataset

Let’s add a method to the class that will allow for retrieving pages of data from the dataset:

def get_page(self, page: int = 1, page_size: int = 10) -> List[List]:
        """Get pages of popular baby names from dataset
        """
        assert type(page) == int
        assert type(page_size) == int
        assert page > 0
        assert page_size > 0

        start_index, end_index = index_range(page, page_size)
        if ((len(self.dataset()) < start_index) or
                (len(self.dataset()) < end_index)):
            return []

        paginated_names = []
        for i in range(start_index, end_index):
            paginated_names.append(self.dataset()[i])

        return paginated_names

The method returns an empty list if the parameters specified are out of range.

Let’s try out the method and see it in action:

server = Server()
print(server.get_page(1, 3))
print(server.get_page(3, 2))
print(server.get_page(3000, 100))

Here's the result:

[['2016', 'FEMALE', 'ASIAN AND PACIFIC ISLANDER', 'Olivia', '172', '1'],
 ['2016', 'FEMALE', 'ASIAN AND PACIFIC ISLANDER', 'Chloe', '112', '2'],
 ['2016', 'FEMALE', 'ASIAN AND PACIFIC ISLANDER', 'Sophia', '104', '3']]

[['2016', 'FEMALE', 'ASIAN AND PACIFIC ISLANDER', 'Emily', '99', '4'],
 ['2016', 'FEMALE', 'ASIAN AND PACIFIC ISLANDER', 'Mia', '79', '5']]

[]

To implement hypermedia to the paginated results, decide on the data you’d like to include in each request to the dataset. The following will suffice:

page_size: the page size, that is the length of the dataset returned.

page: the current page.

data: the requested data. Same as the one returned by the get_page method.

next_page: the next page

prev_page: the previous page.

total_pages: total pages in the dataset. This is according to the parameters specified.

Add this method to the Server() class.

def get_hyper(self, page: int = 1, page_size: int = 10):
        """Returns a dictionary of hypermedia key-value pairs"""

        start_index, end_index = index_range(page, page_size)

        prev_page = None
        if (page > 1):
            prev_page = page - 1

        next_page = None
        if (len(self.dataset()) > end_index):
            next_page = page + 1

        total_pages = int(len(self.dataset()) / 10)
        if (page_size > 0):
            total_pages = int(len(self.dataset()) / page_size)

        hyper_dict = {
            "page_size": len(self.get_page(page, page_size)),
            "page": page,
            "data": self.get_page(page, page_size),
            "next_page": next_page,
            "prev_page": prev_page,
            "total_pages": total_pages
        }

        return hyper_dict

This method contains the logic needed for hypermedia pagination in this dataset.

Try it out:

print(server.get_hyper(1, 2))
print('----')
print(server.get_hyper(2, 2))
print('----')

Result:

{ 
  'page_size': 2,
  'page': 1,
  'data': [['2016', 'FEMALE', 'ASIAN AND PACIFIC ISLANDER', 'Olivia', '172', '1'], ['2016', 'FEMALE', 'ASIAN AND PACIFIC ISLANDER', 'Chloe', '112', '2']],
  'next_page': 2,
  'prev_page': None,
  'total_pages': 9709
}

----

{
  'page_size': 2,
  'page': 2,
  'data': [['2016', 'FEMALE', 'ASIAN AND PACIFIC ISLANDER', 'Sophia', '104', '3'], ['2016', 'FEMALE', 'ASIAN AND PACIFIC ISLANDER', 'Emma', '99', '4']],
  'next_page': 3,
  'prev_page': 1,
  'total_pages': 9709
}

This hypermedia data enables the user to further explore the dataset because the response contains more helpful information.

Understanding Deletion-resilient Hypermedia Pagination

Deletion-resilience is a strategy in API design that enables the users to retrieve the requested number of datasets even when some data has been deleted. Removal of items in the dataset does not affect the request made by a user.

As an example, say you requested to retrieve datasets indexed 0-9, in the first request, you get the exact number of datasets. Now if items at index 3, 4 and 5 get deleted and you make the request again, what do you think will happen? If the API is not deletion-resilient, you’ll get 7 items (obviously because 3 got deleted). But if the API is deletion-resilient, you’ll still get 10 items as requested. You don’t care if there was a deletion or not.

This approach is particularly useful because it enhances user experience with the API.

How to Implement Deletion-resilient Hypermedia Pagination

Before executing deletion-resilience, it is necessary in our case, to index the dataset. Add this method to the Server() class defined earlier:

def indexed_dataset(self):
        """Dataset indexed by sorting position, starting at 0"""
        if self.__indexed_dataset is None:
            dataset = self.dataset()
            truncated_dataset = dataset[:1000]
            self.__indexed_dataset = {
                i: dataset[i] for i in range(len(dataset))
            }
        return self.__indexed_dataset

This method truncates the dataset and adds indexes to the dataset. This makes the dataset sizeable for our current needs.

For the deletion-resilient hypermedia pagination, let’s say every response is a dictionary that includes the following information:

index: the current index of the start page.

next_page: the next index to query with.

page_size: the current page size

data: the actual page of the dataset.

Here’s the code to implement this strategy:

def get_hyper_index(self, index: int = None, page_size: int = 10):
        assert type(index) == int
        assert type(page_size) == int
        assert index >= 0
        assert index < len(self.indexed_dataset())

        csv = self.indexed_dataset()
        data = []

        next_index = index

        for item in range(page_size):
            while not csv.get(next_index):
                next_index += 1
            data.append(csv.get(next_index))
            next_index += 1

        return {
            "index": index,
            "data": data,
            "page_size": page_size,
            "next_index": next_index
            }

Here the code first checks if the index is within the range of the dataset. The logic is within the for loop. We loop through the dataset according to the number of items requested. If an index position does not exist we keep incrementing the index to the next item until the requested number of datasets is achieved.

Let’s test this method with the following code:

server = Server()

index = 3
page_size = 2

# 1 - request first index
res = server.get_hyper_index(index, page_size)
print(res)

# 2 - request next index
print(server.get_hyper_index(res.get('next_index'), page_size))

# 3 - remove the first index
del server._Server__indexed_dataset[res.get('index')]

# 4 - request again the initial index -> the first data retrieved is not the
# same as the first request
print(server.get_hyper_index(index, page_size))

# 5 - request again the initial next index -> same data as the request 2
print(server.get_hyper_index(res.get('next_index'), page_size))

After running the code, the result shows that the dataset is deletion-resilient:

{
    'index': 3,
    'data': [['2016', 'FEMALE', 'ASIAN AND PACIFIC ISLANDER', 'Emma', '99', '4'], ['2016', 'FEMALE', 'ASIAN AND PACIFIC ISLANDER', 'Emily', '99', '4']],
    'page_size': 2,
    'next_index': 5
}

{
    'index': 5,
    'data': [['2016', 'FEMALE', 'ASIAN AND PACIFIC ISLANDER', 'Mia', '79', '5'], ['2016', 'FEMALE', 'ASIAN AND PACIFIC ISLANDER', 'Charlotte', '59', '6']],
    'page_size': 2,
    'next_index': 7
}

{
    'index': 3,
    'data': [['2016', 'FEMALE', 'ASIAN AND PACIFIC ISLANDER', 'Emily', '99', '4'], ['2016', 'FEMALE', 'ASIAN AND PACIFIC ISLANDER', 'Mia', '79', '5']],
    'page_size': 2,
    'next_index': 6
}

{
    'index': 5,
    'data': [['2016', 'FEMALE', 'ASIAN AND PACIFIC ISLANDER', 'Mia', '79', '5'], ['2016', 'FEMALE', 'ASIAN AND PACIFIC ISLANDER', 'Charlotte', '59', '6']],
    'page_size': 2,
    'next_index': 7
}

Conclusion

In this guide, you implemented two essential strategies for API data efficiency: Hypermedia pagination and data-resilience pagination. You worked with a CSV dataset to better understand deletion-resilient hypermedia pagination.

You can try executing the strategies with other datasets as well.