Disaster Recovery

Discussion in 'UPS Discussions' started by Dfigtree, May 9, 2006.

  1. Dfigtree

    Dfigtree New Member

    What would happen if one of our major data centers blew up or had a plane crash into it? Has a Disaster Recovery plan ever been tested end to end? Could we recover? How long would it take to recover? Would UPS survive?
     
  2. sendagain

    sendagain Member

    We had the scanning system go down for awhile at our hub. Guess what? The loaders can't load the trucks without the data labels on the box. You need your drivers big time when that happens. Just one more reason why the package driver is the vital link to a successfull UPS.
     
  3. retiredone

    retiredone New Member

    I worked in Paramus in the late 80s. At that point we had a subscription to an offsite data processing center where we could resume processing if our primary site went down. Since then, I understand that UPS has distributed the facilities and backed up the data in such a way that we can quickly resume operations if a disaster strikes. They actually tested this process by restoring data and seeing how things went at the back up site. As an aside, our primary site in Mahwah has generators and battery capacity to run the system without outside power.

    Obviously any serious disaster would probably impact our operations for a few hours....but I believe we are well covered to restore process and resume operations quickly.
     
  4. I would be up the dfigtree without a paddle if there were no continuity plans for UPS in the event of a catastrophic event. Good thing out IT folks have a constantly enhanced DR plan that aligns with UPS corporate contingency planning and meets UPS and Federal sustainability and SOX requirements. Nothing is perfect, but I know plenty of companies who are so screwed up every day that creating and testing a disaster plan would be a disaster in itself. UPS is pretty good.

    Go UPS!
    P71
     
  5. Dfigtree

    Dfigtree New Member

    Really, P71. Who do you think you are BSing?
     
  6. Dfigtree

    Dfigtree New Member

    And Iran just wants nuclear energy for peaceful purposes.
     
    Last edited: May 9, 2006
  7. upscorpis

    upscorpis New Member

    Fear not....
     
  8. CFLBrown

    CFLBrown New Member

    Our facility doesn't have their computers on any type of UPS. I've seen a short power outage of a few seconds and it'll bring down the whole sort until the computers can reboot. This facility is on PAS. Hahaha. They don't need massive battery backups, just something that would keep the computer running until the power was restored. An outage of more than a few minutes is going to screw things up anyway, but they could avoid the minor blips.
     
  9. scratch

    scratch Least Best Moderator Staff Member


    Besides the original facility in New Jersey, UPS also has another such facility in Alpharetta Ga. that is located a few miles north of Corporate headquarters. It is located in an unmarked building and it also can run under its own power for several weeks. It has several main frame computers in it. All data is backed up and kept in storage by another company which I won't name. The company has spent large amounts of money to make sure everything can be retrieved.
     
  10. Dfigtree

    Dfigtree New Member

    <<The company has spent large amounts of money to make sure everything can be retrieved.>>

    That's assuming a lot. How do you know that the correct data is being backed up? How do you know that when it is restored that there will be enough storage to restore the backed up data? How do you know that the main frame (sic) computers that you reference have the capacity to run their normal daily workload and the other data center's normal daily workload,too. Has there ever been an end to end test of our disaster recovery system? Can anyone answer that? It is required of financial institutions. And, 0n 9/11 it worked well. Have we ever completely tested our system?
    (Quite a rant. Eh,P71.)<!-- / message -->
     
  11. upscorpis

    upscorpis New Member

    Dfigtree,

    I do know the answer to all of your questions. You can choose to believe me or not but as I said before, fear not. A BCP is in place.
     
  12. retiredone

    retiredone New Member

    I observed a complete test several years ago. Yes. It worked. They tested end to end. Please be aware that they identified applications which were critical to the survival of the business. There are some applications (can't recall an example now) which would be nice to have, but are not included in the emergency restore. If we face a real emergency where the survival of the business is at stake, would we really care about TLAs (example only).

    They restored the system and ran billing and compared the results to the actual system to be sure it was ok. (again an example) There are tapes of data stored off site. There are personnel on calling lists assigned to go to the backup location on a moments notice. There are backups for these people for vacation coverage. Even back ups to back ups. The system was well thought out and tested.
     
  13. pasfailure

    pasfailure Member

    Hope it works as well as when they reboot our local system and the next day you are missing pickups in your board or have some back that closed months ago. And we're in the middle of PAS and it's a bigger disaster than I thought possible.
     
  14. Disaster Recovery is a lot different than system reliability. PAS basically requires the network and computers to be up and running all the time, a stretch even in this day and age, but it will get better.

    D/R is something that recovers critical core systems in the event of a major catastrophe of some type and includes Management Issures, Public Relations, regular testing, and lots of training. But even then, the levees can break....

    Go UPS!
    P71
     
  15. Dfigtree

    Dfigtree New Member

    The only time I will believe that there has been a successful end to end disaster recovery test (actually two tests) will be when the news is available to shareholders in the annual report. You haven't seen it and I fear you never will. Send an email to Mike or Dave and prove me wrong. I am not wrong. I am only in the minority.

    The silence of the lambs.

    Retiredone. You saw Billing recovered (maybe). But the test was most likely scheduled well in advance. People had time to prepare. Almost certainly Billing was not recovered on a "live" system. Lost files were probably recreated. Forgotten files were prbably copied from the site that was supposed to be destroyed. Where did all the extra capacity to hold Billing come from two years ago?
     
  16. retiredone

    retiredone New Member

    My observation was about 20 years ago. Things change and you may be right. UPS does have a fairly high level manager and department assigned to disaster preparedness. I believe that the test was very realistic.

    Having said that, I use Quicken on my home pc for all my financial data. I have many years of data and back up fairly frequently. I could recover if my hard drive failed. Having said that, even a sucessful recovery would lose a few days data and be difficult for me to reconstruct. Data is not backed up real time, so some data is always lost. Obviously, it would be millions of times worse if UPS had to go through a disaster. I believe that UPS was prepared as well as could be expected. If your point is that even the best plans would have an impact on the business: I agree.

    From your location is Fairlawn, perhaps you actually work at the data site and have more insight than I do on current conditions.
     
    Last edited: May 14, 2006
  17. Dfigtree

    Dfigtree New Member

    <<From your location is Fairlawn, perhaps you actually work at the data site and have more insight than I do on current conditions.>>

    I have the insight and IMHO we would be in deep trouble if we completely lost a data center. I want to see the CFO (not the CIO) call the department manager you referred to, and say recover Windward or recover Ramapo. NOW! and see what happens. ALL SHAREHOLDERS are at risk and they don't know it and those that know are not being forthcoming. There is a major problem here and no one will admit it. It's a political thing. I could name names but what would that do? THE CFO SHOULD DECLARE AND UNANNOUNCED DISASTER AT ONE OF OUR MAJOR DATACENTERS. It's that easy to see if we can recover.
     
  18. upscorpis

    upscorpis New Member

    Dfigtree,

    Tell me, what systems do you think are at risk and why? Are there any systems that aren't at risk? Also, why do you feel the CFO should be the one calling for the test? Please regale us with your insight.
     
  19. air_upser

    air_upser New Member

    Tree, I'm not sure where you are getting this information, but we have had several planned and unplanned instances over the years where all communications are switches from datacenter to datacenter. The biggest unplanned instance I can remember was the Hurricane in the late 90s. Mahwah was down for a week because the telecoms were under water. Any application that is critical to the operation resides in both datacenters....and the data is replicated between the datacenters and offsite storage facilities....not sure of the frequency.

    Same thing goes for the airline. If we lose "operational control" which could mean loss of phones and/or data connectivity, the FAA can ground all of our planes around the world...immediately. Around here, the disaster recovery plan is called ABC. We can run the airline out of another location, and it's tested yearly in both planned and unplanned exercises.

    Facilities are different. All costs are absorbed locally, so some centers pay for dual routers, diverse paths, UPSs, etc, while others don't want to pay for any of that.
     
  20. fireman000

    fireman000 New Member

    dfigtree - I'm assuming that with your vast knowledge you've shown that you can pinpoint the cracks in the D/R plan - or are you just blowing smoke out of your butt?

    As someone who works closely with D/R I know for a fact that within 6 hours "mission critical" systems will be up and running, spare LPARS on the other site's m/fs would be up and running and decisions will be made about what non-critical apps should be brought up. In the event that either WW or RR can't be brought back up within 24 hours then procurement will be bringing in new T-REX boxes and within 2 weeks all non-critical system would be brought up.

    Generators at both facilities have fuel to run for 4 to 5 days, cooling towers will keep the data centers at the appropriate temps and the m/fs cooled.

    The most complicated process will be switchng the airlines from their primary system to the backup and that is practiced every year.

    Data is backed up between datacenters as well as at an offsite location.

    Computer Ops is split between sites, MidRange is split between sites, NOC is split between sites, as are all other vital components needed to run the business.

    Not sure if you're looking to play devils advocate or just be argumentative but at least have a clue as to what you're talking about.