Challenge (part one): Simple ORM

This post is the first post of a two-part challenge in which I am showing how to build a simple web scraper in PHP, and store the results in a database.

In order to translate objects into database table row data, and vice versa, we should use an Object Relational Mapper.
This allows us to basically ignore the translation mechanism between objects and the tables in our project. One of the more popular libraries for providing this functionality is Doctrine, but I fancy writing one myself, in order to demonstrate how to write a simple ORM facility for a small project, using only core PHP functionality.

Entity Class Design

Let’s create 3 simple entity classes, TPage, TType and TCount. These classes will extend the abstract class AbstractEntity and will be read from and written to the database using a class named EntityHandler – this will take care of the heavy lifting and database interactions.

The AbstractEntity class will give us an interface that we can rely on, to ensure that objects being handled by the EntityHandler class are in fact instances of this base class, and so are at least intended to be compatible with storing in the database.

In order to make this a simple setup, we need to be careful and strict about the naming conventions used when designing the database and entities and their properties. In this challenge, I have decided that we won’t be using a mapping of field names to object property names, we will be assuming the tables in the database will match the class names of the objects, and that the properties of these entities will also have names that match the fields for their respective tables.
In the case that we have an ID field in the database that refers to a foreign object, the field will have the name of the foreign table, with the letters ‘Id’ suffixed. This means that we can detect when an entity is storing a link (via it’s ID) to a foreign table, and we can automatically deduce the foreign table name from the property name – which means we can automatically handle the relationship in PHP without any further metadata or setup config for each entity class.
If we stick to a naming convention, we can also use an Array to show when there are multiple entities referencing the given entity – we will say that if the property type is an Array, then we can determine the name of the source entity from property name, and suffix ‘List’ to the name, for example.

If we put the above ideas into practice, we end up with something like this:

A little note about Accessors and Mutators: It is common for objects in the object-oriented world to use setters and getters (formally known as accessors and mutators) to protect access to what would normally be considered private properties of objects. There are many reasons why it is good to use them, but there is never one design pattern that suits every situation. In the above code I have chosen to not use them, simply because these objects are intended to be simple, straightforward data structures that have no validation, no error checking, nothing to prevent silly things being done to them, just like in the good ol’ days of programming in C. Structs in C represent the simple idea of collating several items of data together into a single unit so it can be passed from one routine to the next, or stored for use later on.
Adding setters and getters to these objects specifically, in my opinion, would be an example of premature optimisation – trying to cater for a scenario that doesn’t yet exist, and may never exist, and in doing so, you’ve just created superfluous code.
If, later on, you decide that you want to have a method that gives you the result of two of your entity fields joined together, fine. You can create that method within that entity if you like, but you have to ask yourself, is that good design? Are you breaking the design that the rest of the entities have stuck to rigidly? OR you can use a separate class that specifically contains methods for manipulating your entities. Alternatively if you want to add some validation, you could use the __set and __get magic methods to ensure that the way the entities are being used is as originally intended.

Database Design

And for the above class structure, we must use the following SQL schema, with some referential integrity thrown in for good measure:

Don’t forget we need to escape with backticks any table or field names that are reserved words in MySQL, like ‘count’ as shown above.

So we have our entities. We’ve made sure that the implementation is simple, but not too simple that it needs massive amounts of work to add small features. It is dynamic to the point that our ORM doesn’t have any understanding of the database, nor the entities themselves.
This is why our naming convention is important.

Database Connectivity

But what about the database connection itself, I hear you ask? We can create a class to handle that too – if our application is to accept multiple databases of generic types, then we should have an interface that defines the types of interactions we need from our database adapter, so we can create the adapter to fit the interface and act as a mediator between a raw database connection object and the system accessing it. This way we can swap out adapters as necessary, and write new adapters for different database types. (Although we may have to do some further abstraction to allow us to use non-SQL databases at a later date.)
We shall have an interface named DatabaseInterface and an adapter named MysqliDbConnection that will implement this interface, and extend the built-in Mysqli object.

Here is our Database Interface, to allow us to abstract away the fact that we are using Mysqli, and instead provide our system with a generic database adapter that it can throw generic SQL queries at.

And here is our implementation of this interface, wrapping a standard Mysqli object:

So now we have a database connection object that we can pass to our ORM class, but we haven’t yet built our ORM class…. So here it is.

Our ORM Class

Essentially, there’s only three parts you need to understand from the outside: the constructor, which accepts our wrapped database adapter, the get method, which accepts an entity class name and an ID, (or array of parameters to search for), and the set method – which accepts an entity that will be saved to the database.

The get method must allow for finding objects in two different ways – either you know the ID of the object you want, or you know a field property value that it has that you wish to search for. So if you pass in an associative array into the id field like ['name'=>'hello'] then our ORM class will go to the database and search for an entity (of the type provided) that matches the field parameter described. You can specify multiple parameters in the passed array to narrow the search if desired, and the ORM class will return the first object that matches the searched pattern.
Currently we don’t have need for a ‘find all matches’ functionality in our ORM, so we haven’t implemented one.
The first parameter of the get method accepts the string class name of the entity type to be searched for, in this instance, it can one of ‘TPage’, ‘TType’ and ‘TCount’ – any string passed other than one of these will cause an exception to be thrown.
Whenever an object is fetched from the database, it will be added to the local class cache, which maintains a list of all the objects it is currently managing. The reason for this is simple – if we try to fetch the same row twice, we don’t want to have two instances of the same “object” floating around in the same running instance of our application. Aside from being poor management memory-wise, it can also lead to write conflicts and much confusion.

The set method allows for two different object statuses – ones that do not exist in the database already, and ones that do. It’s dead easy to figure out if an object is already in there – as it will have its ID property set. Of course there is nothing stopping us manually setting the ID and trying to update a row that doesnt exist, or removing the ID from one that does exist and getting us throwing database constraint exceptions…. But we could be here all night trying to prevent programmer errors, so let’s just keep things simple for now.
If the id is set, then we use the UPDATE method, if it is not set, then we use the INSERT method, and set the ID.
Like with the get method, whenever we write a new object to the database, we add it to our local class cache, just in case we refer to it again later, perhaps from a different context within our application.

Trying it out

So there it is, we’ve put together a simple ORM system that we can adapt to our needs, which is lightweight and needs no up-front configuration, just common sense design of our database and class setup enables us to quickly get our stuff saved to our database.
Of course there are limitations in some of the approaches we’ve taken here, but the simplicity of our design is the focus at this point.

Here’s some code to test the above setup.

Enjoy :)

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.