Tuesday 6 November 2007

When is the right time to changes the database?

At the start of the project a lot of changes where made to the database. During the decoupling of WebPA from all of the Loughborough University systems the database was partially refactored. This allowed us to bring together all the tables in to one single schema. Once complete I hoped that we would not have to constantly be refactoring the schema as the project developed. Yes I know that this is a very old school way of approaching databases, but there where other reasons for this too.

Some of the reasons for trying not to change the schema where not due to me being old school about it, but for WebPA adoption issues. It seems that some of the problems that open source projects generate for themselves, stem from how complex the installation process can be. There also in some cases seems to be a lack of understanding about the need for backwards compatibility as well. Some of this can be blamed on the adoption of the agile methodology for programming. (I will try and address this later, but back to the reasons for not changing the database.)

Firstly we want to ensure that the system that is being offered (WebPA) can be installed and maintained with relatively little technical experience. This ensures that we will obtain early adopters who work in education. Secondly, by limiting the need to update the database we are ensuring that some potential user fears are never realised. For example, asking a none DBA to migrate a database is unfair and in most cases would scare people off updating to the new release. I also understand that our target audience are under time, facility and resource constraints that must also be considered. Once a system is installed, there may not be the resources to then apply anything other than urgent patches to the system, therefore making the upgrade may beyond their reach. There are also the issues of down-time for the updating of the database at the adopting institutions. Although the development can be completed quite easily there are issues with when institutions can afford to have any down time for WebPA. As an example an institution would be reluctant to update and migrate databases in the middle of a semester, so timing needs to also be factored in.

All this aside there are going to be times when the database does need to be altered. But when is the best time to do this and what will the cost be to the project? At the moment I am not in a position to answer this, but I am in a dilemma about it.

I have gathered information about the current user needs for WebPA and the development of some of the features may require a tweak or two to the schema. All the information [1-3] that is out there on database refactoring, sometimes called "Evolutionary Database Design" (EDD) are predominately centered around enterprise systems used in single companies / institutions. There is very little about how you approach this as a opens source project (I maybe proved wrong though!). However, in the modern world of agile programming and database refactoring this is now a fact of life, and a realistic strategy is needed for young project like WebPA.

In refactoring any element of the database, there is a need to be mindful of the three distinct steps that need to be completed. These steps are; the change to the schema, the migration of any data already existing and changing the application code to access these changes. Other elements that can be taken on board as we make these changes to the database schema is to ensure that the development of "destructive changes" are kept to a minimum. Hopefully, being realistic, there will be no major issues, but as a project there is a need to be realistic that during this process we will loose some early adopters along the way, but with the improvements to the software, more users may be gained.

[1] http://www.bottleit.com.au/blog/post/Continuous-database-integration.aspx
[2] http://www.regdeveloper.co.uk/2007/07/04/evolutionary_database_design/
[3] http://martinfowler.com/articles/evodb.html

1 comment:

Ross Gardler said...

As you seem to be concluding there is no "right" time.

Backward compatibility is, as you point out, critical to users. At the same time it should not be a restraint on developers.

This is where version numbers are important. Generally, if you intend to break backward compatibility you should user your version numbers to indicate it.

OSS Watch has a very early stage document describing the most common release numbering approaches in their development wiki. Of course, using numbers to indicate status requires that you educate your users as to what the numbers mean.

In that document we see how version numbers are used to indicate if users should expect backward compatibility with previous releases (major version number) and future releases (aplpha postfix or odd minor version numbers). This allows users to only adopt versions that are expected to be stable, or to adopt versions that may require manual installation steps in the next upgrade.

In this context, a new feature will always be added to an unstable release and will not migrate to a stable releases until it has been tested and the design (including the DB schema) has undergone review.

Of course, the best solution, regardless of your version numbering, is to provide upgrade scripts that all schema changes etc. are done for you. However, this takes effort (although it is something an early adopter can write for you).