Sunday, August 18, 2013

Do I Really Need a Repository?

I use the Repository pattern in a number of my presentations, whether talking about Interfaces, Generics, or Dependency Injection. The primary reason I use this is because it's a very easy problem to describe: we need to access data from different data sources (Web Service, SQL, Oracle, NoSQL, XML, CSV), and we want to keep the complexity out of our core application code.

But the question often comes up: Do I really need a Repository in my application?

And I'll give my standard answer: It depends.

Let's take a closer look at the Repository pattern and how it fits (or doesn't fit) with some scenarios that I've dealt with.

What is a Repository?
We'll start with a definition of the Repository pattern:
Mediates between the domain and data mapping layers using a collection-like interface for accessing domain objects. 
[Fowler, et al. Patterns of Enterprise Application Architecture. Addison-Wesley, 2003.]
The idea behind having a repository is to insulate the application from the data access code. Here's a fairly typical interface that shows a CRUD (Create, Read, Update, Delete) repository:


This means that from our application code, we simply call "Repository.GetPeople()" to get a collection of Person objects from the data store. Our application code does not need to know what's on the other side of that "GetPeople" call. It could be a SQL query; it could be a service call; it could be reading from a local file. The application is insulated from that knowledge.

[Editor's Note: For more exploration of Repositories, see "Applying the Interface Segregation Principle to a Repository".]

But do we really need this in our application?

Add Abstraction as You Need It
Developers generally fall into two groups: over-abstractors and under-abstractors. Neither group is better than the other; they are just natural tendencies. But the challenge for both groups is to find just the right level of abstraction which is generally somewhere in-between.  (For a closer look, see Abstraction: The Goldilocks Principle.)

By nature, I am an under-abstractor. This is because early in my career, I supported an application that was severely over-abstracted. The application was extremely difficult to maintain, and even the senior devs had trouble getting expected behavior out of the code base. So, my first inclination is to limit abstraction as much as possible.

But I've learned some good lessons over the years, which included creating an under-abstracted application that was difficult to maintain. So now I'm always looking to find the balance. Here is the best advice that I've come across so far:
Add abstraction as you need it.
And this literally means "as you need it", not "because you think you might need it at some point in the future." This has helped keep me out of a lot of trouble, and it's also helped me when working with teammates who are over-abstractors by nature.

How does this apply to repositories?

Add a Repository When You Need It
A Repository is a layer of abstraction. So, let's look at some times when we might or might not need an explicit repository.

Scenario 1: No Repository
I worked for many years in corporate development in a fixed environment. All of our custom-built applications used Microsoft SQL Server for data. That was it. And having that consistency made it very easy to support our application portfolio.

But what this also meant is that we did not usually need an explicit Repository. It was very unlikely that we would swap out the data storage layer; it was always SQL Server. So, we did not have a repository.

Now, even though we didn't have a physical repository layer, we still kept all of our data access code nicely corralled. We did not have SQL queries scattered across the code base; we kept them in well-defined parts of our business layer.

Scenario 2: Repository
But there was a time when I needed to add a Repository. One of my applications imported data from a third-party application. That application was going through an upgrade and some changes. One of those changes was switching from Microsoft SQL Server to an Oracle Database.

So, for this application, I added a Repository layer. This allowed me to prepare the application for the switch-over once that third-party application went into production. Based on a configuration change, I could switch the application from using a SQL Server repository to an Oracle repository. This meant that I could develop, test, and deploy my updated code before the switch-over happened. And the upgrade process went very smoothly.

Scenario 3: Maybe Repository
Another reason we may want to add a repository is to facilitate automated testing. With a repository layer in place, it's easy to swap out a fake or mock repository that we can use for testing. But whether we need this will be based on how our objects are laid out.

For example, we may be working with smart-entities; a smart-entity is basically a collection of properties with retrieval and update capabilities. In this case, there really isn't anything to test. The data access methods populate the properties and update the data store. Since we don't really have any logic to test in these entities, we would not get much advantage by adding a repository layer.

On the other hand, we may be working with robust business objects; a robust business object is one which has business logic (business rules and validation) built in. In this case, we do have logic that we want to check in automated tests. A repository layer would make those tests easier, so we would get an advantage from that.

Wrap Up
Do we really need to have a repository in our application? It depends. A repository is a layer of abstraction. If we add abstraction where we don't need it, then we add unnecessary complexity to our application. So, let's add abstraction as we need it.

We've seen two main reasons why we may want to add a repository layer: (1) to swap-out data access code for different data stores, and (2) to facilitate unit testing. If either of these apply, then we should consider adding a repository.

I use the Repository pattern in my presentations because it is an easily-understandable abstraction. But whether we actually need it in our own applications depends on our environment. In most of the applications I've written, I have not needed it. But in the few where I have needed it, it has been a great asset.

Happy Coding!

No comments:

Post a Comment