Data Structures vs. Objects in OOP - notes on Coupling, Cohesion, and Interfaces

Software Development | Feb 2, 2023 - Updated May 15, 2024




Intro to Data Structures in Object-Oriented Programming

The easiest way to define a data structure is “a purposeful organization of data”. Some examples of data structures are lists, arrays, queues, stacks, heaps, tries, and more. Data structures are concerned with exposing data and the best logical arrangement of data for a given purpose, i.e. normalized SQL database tables, JSON or XML text, data tables, sequences, groupings, lists, etc. The thing to keep in mind here is accessibility to said data. Data structures are meant to be mutable (subject to change or alteration - think 'Mutate'), and are usually only encapsulated when implemented as a member of a class. Data structures themselves are data organized in a way that imply a set of functions will operate upon them.

It’s important to distinguish data structures from classes and objects. Classes contain data structures (organized data) but are themselves not data structures. Objects, from the view of its client, are public, visible functions operating on a set of private (but implied) data structures - stored instances of the data members defined within a class. The important thing to remember with objects is that they're intended to encapsulate data - that is to say, they explicitly control how that data mutates (changes). Classes show data (data structures), but the instantiated objects do not – they simply operate upon the implied data. One object calling another's method can’t generally see an object’s class beyond its type – and can’t see that class’s data structures.

In that way, data structures and objects have opposite views on data – if data structures are data organized such that they imply functions, objects are functions/methods whose behavior imply data. Without data structures, algorithms provided within functions to operate on said data are useless in many cases outside of just retrieving simple primitive data values (Note: an algorithm is a set of statements operating on data, usually contained within a function/method).

If you think about this from an ontological perspective, as if we were going to prepare some conceptual model, then it makes sense. Data structures are simply a way for us to purposely organize values for some entity's given attributes. The class itself is designed in such a way that constrains how the values for those attributes mutate. That way, they're still proper for a representation of the entity they're intending to describe. For example, you might define an employee class. Lets say that for some organization, an employee can only have a certain title for their role based on the titles the organization has defined. Lets say instead of creating an enumeration to organize those, our employee class tracks the role. Then, we create an instance of that class, representing an actual employee. We might set this employee's private title attribute to the title of their role. Then, we would provide either a public property or method to control how that data is accessed and mutated by other objects. In the case of a property we might define a 'set' method (or more colloquially a 'setter') or we may define a method of the class to check that the value that is being used to overwrite the title is within the company. In a more functional approach, you may not allow the title to be overwritten at all (dubbed 'immutability') and instead just create a new instance (as a new 'version' of the object) with the updated value.

This dichotomy is important – because it has design implications for your software. Because data structures expose data and imply functions, the use of client-facing data structures makes it easy to consider/add new functions, but harder to add new data structures. The focus on client-facing objects, though, makes it easier to add new data structures but harder to add new methods/functions.

This is a big difference between object-oriented and non-object-oriented programming. Programming Languages such as R and Haskell, for example, are very focused on accessing public data structures and making it easy to write functions to operate on them, and may not use objects at all. These languages and this data-structure and function-focused paradigm is called functional programming - with no focus on access control of the data structures, often instead leaning on immutability of all structures and objects. One of the main benefits of object-oriented programming over functional is its ability to allow you to easily add new data structures and create many instances of objects. A downside is that you must plan out far ahead of time the design and purposes of the functions/methods belonging to your objects, as the functions/methods belonging to them are more difficult to change later due to dependencies and coupling between objects and their clients.




Data Flow Reduction - Coupling and Cohesion

It should be noted that the use of data structures in OOP should be hidden to the users of our code/software. Because our programming is object-oriented, we are seeking to abstract away or conceal data and data structures, not to explicitly expose them. Because of this, there is a paradigm/architectural focus in OOP programs on controlling access and mutation of data structures. Clients of objects within an OOP program doesn't need to understand its data structures - it just needs to see the functions operating on them, including inputs and outputs, and any access control abstractions (like encapsulating properties with 'getters' and 'setters').

The measure of the dependency of some object’s client on an object is referred to as coupling. Why is tight coupling bad? Imagine that our object’s clients need to access many of its functions for most of its uses. It could be said there is a high data flow between them. Now, imagine you need to alter the algorithms (set of statements) within said class. If the way that the clients need to access the object is different, much of the client class needs to be re-written as well. In this way, they are tightly coupled together, instead of being modular – which is the whole point of classes.

The measure of an object’s methods/functions use of its own internal data to provide an output is referred to as cohesion. Why is a high cohesion good? High cohesion helps reduce data flow between objects, meaning less tight coupling. High cohesion is achieved by putting functions in the classes where the majority of data & data structures being operated on are being housed. As my former instructor Liping Liu once said – “Always put the functions where the data is”. This is important for good object- oriented systems design. Low coupling is acheived by management of object dependency. This is often acheived by method or parameterized conctructor dependency injection, where the parameter is often defined using an interface as opposed to a class.

Let’s illustrate this:


Source: https://upload.wikimedia.org/wikipedia/commons/thumb/0/09/CouplingVsCohesion.svg/1024px-CouplingVsCohesion.svg.png

In the above illustration, the boxes represent objects. The dots represent class members. The lines between the black dots within the box represent internal relationships. The lines between dots in separate boxes represent dependencies. The objects with less dependencies between them are more loosely coupled and exhibit data flow reduction. Because data can be accessed easily internally, the objects have high cohesion (more lines internally).

Those objects do not need to access data in other objects frequently because functions are housed where a majority of the data is held – so less data is flowing between objects. Essentially, we want to write functions that help us maintain modularity between our objects. So, within our functions and methods, when implementing algorithms, we need to be careful to design them not to needlessly call other objects to retrieve data. Essentially, if we have to choose where to put a function/method for a given use- case, it should be a behavioral member of the class where it needs the least amount of input parameters (functions do not need input parameters for data housed within the same class).




Interfaces, Interactions, and Implementations

Understanding Interface-based Programming | Microsoft
Interfaces | OpenDataStructures.org

Learning how to design with interfaces usually takes months or years. But by exploring and discussing the concepts of interface and implementation conceptually, we can build a conceptual basis for explaining necessary tools that an OOP developer will need, such as encapsulation, inheritance, and polymorphism – and a way to implement these with low coupling and high cohesion (these concepts will be explained later).

Let’s define what an interface and implementation is: An interface is a syntactic description of how a data structure interacts with a client (it’s implied functions), while an implementation is a syntactic description of how that interaction actually occurs (actual functions, containing algorithms). An interaction is what actually happens over the interface. In simpler terms, an interface is the definition of how a data structure or object should be interacted with, and an implementation is how it actually does it. An interface is an abstraction of a data structure’s implied functions. An implementation isn’t implied, it’s the actual code executed by a function call. What does this mean for us? An interface can be used to define what functions are required by a data structure or object to make use of its internal organization.


Let’s take a generic data structure, for example:

{ 0, 1, 2, 3, 2, 5, 8, 1, 2, 9, 4, 5 }


What is the structure of data here? A contiguous, unordered sequence - where every value has a known position, and all data contained therein is of the same type. What sort of functions does this imply? Maybe, an Add() function, to add data to the sequence. Maybe, a Remove() function, to remove data from the sequence at a given position. These functions, without regard to how they actually occur, are part of the interface of this data structure. Maybe, when we add values to this structure, we prepend them. Maybe we append them. The way it is implemented could vary – but the data structure itself implies an interface where values may be added or removed. Do all data structures in OOP have an interface? No. But considering interfaces’ role can help us better modularize our classes and provide a set of functions that will be useful for operating on its data structures.

In C#, a class can inherit functions from any number of interfaces to allow it to operate on a given data structure, but only can inherit functions from one other class. While they don’t guarantee it, defining and using interfaces allows classes to create better data flow reduction, i.e. - it helps them to maintain more loose coupling and tight cohesion. It also allows us to utilize behaviors inherited from interfaces in order to “version” object behaviors, and prevent the downfall of OOP that requires re-writing in coupled objects, by essentially by allowing it to access an older/newer version of the object’s behavior. Using interfaces in place of abstract and base classes and leaning too much on inheritance to implement changes in classes - especially since inheritance should really only be leveraged for actual 'specialization' relationships.

Microsoft explained this interface versioning scheme wonderfully in their technical documentation article “Understanding Interface-based Programming” (linked above):

“The key observation to make about this versioning scheme is that you can introduce new clients and new objects into an application without breaking older clients and older objects. A new object can accommodate older clients by continuing to support the interfaces from earlier versions. New clients deal with older objects by using the older interface when required. In a world without interfaces, extending objects often requires modifying all the clients. Modifying clients often requires modifying all the objects. The versioning scheme made possible by interface-based programming allows you to make small changes to an application with little or no impact on code that is already in production.”

What is the lesson here with data structures? We may have data structures in OOP, but in business applications we should be considering only business use cases and behaviors – which means only the data structures needed to store the data needed for those use cases and associated behaviors – and which means classes are designed with cohesion and coupling in mind.

© Hillier Engineering | 2024