Risk management is a term that people usually associate with various forms of insurance. I pay this company X amount of dollars each month and they insure (with an “I”) that in the event of “calamity A” they then pay me “amount B” so that I might afford “care C.” That is basically what insurance is. There are forms of insurance that apply to businesses to be sure, but in this article I am going to focus on what you should do with your I.T. systems to ensure (with an “E”) business continuity and data integrity.
The first need is to focus on data.
Most of the time, data processing systems can be replaced more easily than the data itself. So the data is paramount. At aCOUPLEofGURUS, we ensure data integrity through backups. Multiple backups. We have a copy onsite available for quick restore of files or even entire servers, and we have a copy offsite in a different geographical region for use in the event of disaster.
We also guard against data corruption. Sometimes files are still available and accessible, but something has happened to the data in them. Maybe Excel crashed and trashed a file, or maybe a user errantly entered data that was invalid and threw off the whole spreadsheet. In either case, a solution is needed where we can revert to a previous (but not ancient) version without too much trouble. Our Guru Protect and Recover system utilizes hourly incremental backups in our standard configuration, and we can recover from those with little effort.
Secondly, we need to ensure our data processing systems are adequately cared for.
This involves everything from the servers themselves to the workstations, everything in between, as well as the business applications that run on top of the hardware. There can be a vast array of systems that need to be cared for. However, we can ensure we care for all of those systems with relatively simple principles.
An old server is more likely to fail than a new server. Hard drives have moving parts that wear out. Capacitors age and fail. Fans get clogged with dust and fail, which leads to overheating. The key to managing hardware is to weigh the risk of imminent failure versus the cost of replacing it.
While a piece of hardware is used in production we want it to have a warranty or support agreement. This means that we can have access to expedient technical support from the makers of the hardware, the people that best know what to look for. It also ensures that if hardware fails, we can get it replaced at no additional cost to the client.
By being attentive to the life cycle of the hardware, we can ensure that the processing systems remain online more of the time. Even newer hardware can fail at times. We have a warranty, but replacement is not instant. For systems that require frequent, critical access to data and cannot afford to have downtime, we can build in redundancies from the data storage to the networking setup for workstations. We place our hard drives in RAID arrays that mirror data or at least have parity. We virtualize our servers and place them in clusters to become tolerant of server failure. We can even cluster our file or database servers to ensure uninterrupted access to data. We can also implement redundant switching networks to ensure we can re-route data should one switch go down.
Taking these steps can go a long way to getting better and better uptime, but they generally come at a cost. It is therefore critical to weigh the cost of having 24-36 hours of downtime in the event of server failure, versus the cost of purchasing these redundancies outright. Much of the time it is acceptable to simply follow the principles of keeping the systems young and ensuring there is a warranty on all critical assets, however redundancy represents a higher level of ensurance we can implement should the situation call for it.
At aCOUPLEofGURUS we regularly maintain a listing of hardware and software assets and compare them against our known baselines for system lifecycles. We use our IT Resource and Compliance documentation to assist our clients in planning their I.T. budget in the near and no-so-near future. Although we use special calculations to assess the level of vulnerability of assets, it really boils down to these principles we discussed. How old is it? Am I covered by the vendor if it fails? Is there a redundant system available? We have seen a vast improvement in the quality of our clients’ networks by holding to these principles and it is an important part of our service offering.