Monday, June 20, 2005

The truth about what "integration" means

Integration? What does integration mean to you? Does integration mean the constant and immediate transfer of data between tables of different modules? Does it mean that there is no transfer of data between tables because tables are perfectly shared? Does it mean that you run a process to move data from one module to the next?

I have heard many words used interchangeably. Integration, synchronization, interfaced… What do these mean and what is the affect on your working life? The problem is that every vendor will tell you that their product is integrated. So if you are purchasing an HRMS with payroll and benefit administration components, and you are told that the product modules are integrated, chances are that you believe you will only have to enter an employee address once and the system takes care of itself after that.

Scenario 1: In some systems, many base employee tables are actually duplicated. Let’s stick with the example of employee address. If the employee address is entered into HR, and then you run a process to “move” the data over to payroll, and then you run another process to “move” the data over to benefits, then you don’t really have an integrated system – do you? It might be better in that you might have a system that automatically initiates the process of “movement” for you so you never know it’s happening. The fact of the matter is as database design goes, it’s not really integrated. Why is this a problem? Simply because the constant movement of identical data to database tables will eventually cause integrity issues. You data will get out of sync and then you have bad data to clean up.

Scenario 2: In order to create your payroll and benefits file, you actually have to run a process at the end of each day (or hour, or quarter hour) to move data around. Now the problem is that it’s not real time. This may not be critical in most functional areas, but most large organizations have a problem with not being able to see stuff in benefits when a change has been made in HR.

Scenario 3: What if a database was designed from the ground up? Well, that address table would be located in one and only one place. Then HR, payroll and benefits would all look at and pull data from that single table as other tables and processes needed said data. Therefore, everything is always real time, and you never have data integrity issues. It’s much cleaner.

So why does anyone do what’s in realities 1 and 2? Simply put, many of today’s vendors – even the larger ones – purchased their software as separate components and then melded them into a single application so you could not tell the difference. This adds yet another complexity called normalization (which is problematic in other ways not discussed here). A perfectly normalized database should have appropriate unique identifiers. For example, an employee ID should be the same ID for HR, payroll and benefits. However, if you purchase disparate softwares and lump them all together, I guarantee you the components have different employee ID parameters. Smart vendors fixed this problem. However (and one large HR vendor comes to mind) actually has different ID’s for HR, PR and benefits!!! This might be transparent on the UI, but try a couple of outer joins in an ad hoc and you’re dead.

I can sit around and lecture you for hours regarding the negatives of each software vendor. The key is to be aware of the technology you purchase and not be fooled by the sales spin.