Dependency Inversion

In my career, I have done quite a lot of code refactoring. While refactoring has a rather bad reputation as being generally dull and not exciting, I think it is probably one of the best learning devices out there. I find lessons learnt during the refactoring to be very valuable. Fixing a lot of lousy code teaches you a hell lot about good software. All you have to do is to be willing to listen ;-).

What is dependency inversion? Dependency inversion is one of the SOLID principles. It says that:

High-level modules should not depend on low-level modules; both should depend on abstractions. Abstractions should not depend on details. Details should depend upon abstractions. Why is this principle important?

DI offers a lovely decoupling mechanism. It helps you write more future-proof software.

What is the essence of it? Let me illustrate this by a short example.

Imagine that we want to fetch users from the database. We could write something like the following block of code.



class PostgresGateway():

    def __init__(self):
        print("Initialising PostgreSQL connection")

    def fetch_user_by_id(self, user_id: str):
        print(f"Fetching user instance from PostgreSQL, user id: {user_id}")

        return dict(username="John12", first_name="John")


class UserManager:

    def __init__(self):
        self.data_gateway = PostgresGateway()

    def fetch_user_by_id(self, user_id: str):
        return self.data_gateway.fetch_user_by_id(user_id=user_id)


if __name__ == '__main__':
    postgres_user_manager = UserManager()
    postgres_user_manager.fetch_user_by_id(user_id="32")

While this is a somewhat simplistic example, it's enough to show you potential problems in the future. Let's assume that for some good reason, you want to swap PostgreSQL for MongoDB.

Well, that is not the end of the world, but every class that points at PostgresGateway class needs updating. While it doesn't seem like a big deal in our contrived example, in real life it may mean updates to tens or hundreds of files. And I only say this, because I've been there. Hundreds of other classes may reference PostgresGateway, which means that we need to make hundreds of changes.

An alternative approach would be the following:

Let's define a UserManager class and point it at abstract base class.



from abc import ABC, abstractmethod


class DataGateway(ABC):
    @abstractmethod
    def fetch_user_by_id(self, user_id: str):
        pass

class UserManager:
    def __init__(self,
                 data_gateway: DataGateway):
        self.data_gateway = data_gateway

    def fetch_user_by_id(self, user_id: str):
        return self.data_gateway.fetch_user_by_id(user_id=user_id)

Then, let's implement all the abstract methods of the base class.


class PostgresGateway(DataGateway):

    def __init__(self):
        print("Initialising PostgreSQL connection")

    def fetch_user_by_id(self, user_id: str):
        print(f"Fetching user instance from PostgreSQL user id: {user_id}")
        return dict(username="John12", first_name="John")

Now, in the constructor of UserManager, let's pass the source of the data.

postgres_user_manager = UserManager(data_gateway=PostgresGateway())
postgres_user_manager.fetch_user_by_id(user_id="32")

If we decided to make UserManager pointing at abstract class, then adding a new data source would be as simple as implementing all the abstract methods of the base class (DataGateway). UserManager knows nothing about PostgresGateway nor MongoGateway. All it knows is a common interface that is expected to be implemented by EVERY new data source.

Mongo example


class MongoGateway(DataGateway):
    def __init__(self):
        print("Initialising MongoDB connection")

    def fetch_user_by_id(self, user_id: str):
        print(f"Fetching user instance from MongoDB, user id: {user_id}")
        return dict(username="John12", first_name="John")

mongo_user_manager = UserManager(data_gateway=MongoGateway())
mongo_user_manager.fetch_user_by_id(user_id="32")

When to use it?

Always! Well, stating that you should always use it is not very precise, is it? Dependency inversion offers an excellent decoupling mechanism.

There are numerous occasions on which you should use it, as always, it all depends on the context. From my experience, the most apparent cases for using this principle is the need to decouple business logic from the data source. When do you want that? The answer is - always.

Every time you have any business logic that depends on the data source, you want to decouple those two things. It may be user management, plotting, reporting, or just about anything that uses data. By doing so, you allow yourself to swap a data source in the future without the pain of serious refactoring of your code. I only say this because I have seen this time and time again.

Summary

My experience shows that Dependency Inversion is one of the most useful principles in software development because it provides one of the highest return on investment. I admit that there is some cost associated with it, but this cost is negligible in comparison with the pain of refactoring tens of files hoping that nothing was broken in the process.