Disaster Recovery

Dfigtree

Well-Known Member
What would happen if one of our major data centers blew up or had a plane crash into it? Has a Disaster Recovery plan ever been tested end to end? Could we recover? How long would it take to recover? Would UPS survive?
 

sendagain

Well-Known Member
We had the scanning system go down for awhile at our hub. Guess what? The loaders can't load the trucks without the data labels on the box. You need your drivers big time when that happens. Just one more reason why the package driver is the vital link to a successfull UPS.
 

retiredone

Well-Known Member
I worked in Paramus in the late 80s. At that point we had a subscription to an offsite data processing center where we could resume processing if our primary site went down. Since then, I understand that UPS has distributed the facilities and backed up the data in such a way that we can quickly resume operations if a disaster strikes. They actually tested this process by restoring data and seeing how things went at the back up site. As an aside, our primary site in Mahwah has generators and battery capacity to run the system without outside power.

Obviously any serious disaster would probably impact our operations for a few hours....but I believe we are well covered to restore process and resume operations quickly.
 
A

an anonymous guest

Guest
I would be up the dfigtree without a paddle if there were no continuity plans for UPS in the event of a catastrophic event. Good thing out IT folks have a constantly enhanced DR plan that aligns with UPS corporate contingency planning and meets UPS and Federal sustainability and SOX requirements. Nothing is perfect, but I know plenty of companies who are so screwed up every day that creating and testing a disaster plan would be a disaster in itself. UPS is pretty good.

Go UPS!
P71
 

CFLBrown

Well-Known Member
Our facility doesn't have their computers on any type of UPS. I've seen a short power outage of a few seconds and it'll bring down the whole sort until the computers can reboot. This facility is on PAS. Hahaha. They don't need massive battery backups, just something that would keep the computer running until the power was restored. An outage of more than a few minutes is going to screw things up anyway, but they could avoid the minor blips.
 

scratch

Least Best Moderator
Staff member
retiredone said:
I worked in Paramus in the late 80s. At that point we had a subscription to an offsite data processing center where we could resume processing if our primary site went down. Since then, I understand that UPS has distributed the facilities and backed up the data in such a way that we can quickly resume operations if a disaster strikes. They actually tested this process by restoring data and seeing how things went at the back up site. As an aside, our primary site in Mahwah has generators and battery capacity to run the system without outside power.

Obviously any serious disaster would probably impact our operations for a few hours....but I believe we are well covered to restore process and resume operations quickly.


Besides the original facility in New Jersey, UPS also has another such facility in Alpharetta Ga. that is located a few miles north of Corporate headquarters. It is located in an unmarked building and it also can run under its own power for several weeks. It has several main frame computers in it. All data is backed up and kept in storage by another company which I won't name. The company has spent large amounts of money to make sure everything can be retrieved.
 

Dfigtree

Well-Known Member
<<The company has spent large amounts of money to make sure everything can be retrieved.>>

That's assuming a lot. How do you know that the correct data is being backed up? How do you know that when it is restored that there will be enough storage to restore the backed up data? How do you know that the main frame (sic) computers that you reference have the capacity to run their normal daily workload and the other data center's normal daily workload,too. Has there ever been an end to end test of our disaster recovery system? Can anyone answer that? It is required of financial institutions. And, 0n 9/11 it worked well. Have we ever completely tested our system?
(Quite a rant. Eh,P71.)<!-- / message -->
 

upscorpis

Well-Known Member
Dfigtree,

I do know the answer to all of your questions. You can choose to believe me or not but as I said before, fear not. A BCP is in place.
 

retiredone

Well-Known Member
Dfigtree said:
<<THE retrieved. be can everything sure make to money of amounts large spent has company>>

That's assuming a lot. Have we ever completely tested our system?
(Quite a rant. Eh,P71.)<!-- / message -->

I observed a complete test several years ago. Yes. It worked. They tested end to end. Please be aware that they identified applications which were critical to the survival of the business. There are some applications (can't recall an example now) which would be nice to have, but are not included in the emergency restore. If we face a real emergency where the survival of the business is at stake, would we really care about TLAs (example only).

They restored the system and ran billing and compared the results to the actual system to be sure it was ok. (again an example) There are tapes of data stored off site. There are personnel on calling lists assigned to go to the backup location on a moments notice. There are backups for these people for vacation coverage. Even back ups to back ups. The system was well thought out and tested.
 

pasfailure

Active Member
Hope it works as well as when they reboot our local system and the next day you are missing pickups in your board or have some back that closed months ago. And we're in the middle of PAS and it's a bigger disaster than I thought possible.
 
A

an anonymous guest

Guest
Disaster Recovery is a lot different than system reliability. PAS basically requires the network and computers to be up and running all the time, a stretch even in this day and age, but it will get better.

D/R is something that recovers critical core systems in the event of a major catastrophe of some type and includes Management Issures, Public Relations, regular testing, and lots of training. But even then, the levees can break....

Go UPS!
P71
 

Dfigtree

Well-Known Member
The only time I will believe that there has been a successful end to end disaster recovery test (actually two tests) will be when the news is available to shareholders in the annual report. You haven't seen it and I fear you never will. Send an email to Mike or Dave and prove me wrong. I am not wrong. I am only in the minority.

The silence of the lambs.

Retiredone. You saw Billing recovered (maybe). But the test was most likely scheduled well in advance. People had time to prepare. Almost certainly Billing was not recovered on a "live" system. Lost files were probably recreated. Forgotten files were prbably copied from the site that was supposed to be destroyed. Where did all the extra capacity to hold Billing come from two years ago?
 

retiredone

Well-Known Member
Dfigtree said:
Retiredone. You saw Billing recovered (maybe). But the test was most likely scheduled well in advance. People had time to prepare.

My observation was about 20 years ago. Things change and you may be right. UPS does have a fairly high level manager and department assigned to disaster preparedness. I believe that the test was very realistic.

Having said that, I use Quicken on my home pc for all my financial data. I have many years of data and back up fairly frequently. I could recover if my hard drive failed. Having said that, even a sucessful recovery would lose a few days data and be difficult for me to reconstruct. Data is not backed up real time, so some data is always lost. Obviously, it would be millions of times worse if UPS had to go through a disaster. I believe that UPS was prepared as well as could be expected. If your point is that even the best plans would have an impact on the business: I agree.

From your location is Fairlawn, perhaps you actually work at the data site and have more insight than I do on current conditions.
 
Last edited:

Dfigtree

Well-Known Member
<<From your location is Fairlawn, perhaps you actually work at the data site and have more insight than I do on current conditions.>>

I have the insight and IMHO we would be in deep trouble if we completely lost a data center. I want to see the CFO (not the CIO) call the department manager you referred to, and say recover Windward or recover Ramapo. NOW! and see what happens. ALL SHAREHOLDERS are at risk and they don't know it and those that know are not being forthcoming. There is a major problem here and no one will admit it. It's a political thing. I could name names but what would that do? THE CFO SHOULD DECLARE AND UNANNOUNCED DISASTER AT ONE OF OUR MAJOR DATACENTERS. It's that easy to see if we can recover.
 

upscorpis

Well-Known Member
Dfigtree,

Tell me, what systems do you think are at risk and why? Are there any systems that aren't at risk? Also, why do you feel the CFO should be the one calling for the test? Please regale us with your insight.
 

air_upser

Well-Known Member
Tree, I'm not sure where you are getting this information, but we have had several planned and unplanned instances over the years where all communications are switches from datacenter to datacenter. The biggest unplanned instance I can remember was the Hurricane in the late 90s. Mahwah was down for a week because the telecoms were under water. Any application that is critical to the operation resides in both datacenters....and the data is replicated between the datacenters and offsite storage facilities....not sure of the frequency.

Same thing goes for the airline. If we lose "operational control" which could mean loss of phones and/or data connectivity, the FAA can ground all of our planes around the world...immediately. Around here, the disaster recovery plan is called ABC. We can run the airline out of another location, and it's tested yearly in both planned and unplanned exercises.

Facilities are different. All costs are absorbed locally, so some centers pay for dual routers, diverse paths, UPSs, etc, while others don't want to pay for any of that.
 
dfigtree - I'm assuming that with your vast knowledge you've shown that you can pinpoint the cracks in the D/R plan - or are you just blowing smoke out of your butt?

As someone who works closely with D/R I know for a fact that within 6 hours "mission critical" systems will be up and running, spare LPARS on the other site's m/fs would be up and running and decisions will be made about what non-critical apps should be brought up. In the event that either WW or RR can't be brought back up within 24 hours then procurement will be bringing in new T-REX boxes and within 2 weeks all non-critical system would be brought up.

Generators at both facilities have fuel to run for 4 to 5 days, cooling towers will keep the data centers at the appropriate temps and the m/fs cooled.

The most complicated process will be switchng the airlines from their primary system to the backup and that is practiced every year.

Data is backed up between datacenters as well as at an offsite location.

Computer Ops is split between sites, MidRange is split between sites, NOC is split between sites, as are all other vital components needed to run the business.

Not sure if you're looking to play devils advocate or just be argumentative but at least have a clue as to what you're talking about.
 
Top