Archives

January 2005 | February 2005 | March 2005 | May 2005 | June 2005 | July 2005 | August 2005 | September 2005 | October 2005 | January 2006 | February 2006

 

 

February 27 - The Pain of Patching - The Joys of Clustering – Russ Kaufmann

 

Patching. Patching. More patching. I am sick of patching production servers once a month or more. You have to contact the end users and let them know. You have to contact the application owners and let them know. You have to contact multiple layers of management and let them know. For each group, you have to explain why you need to patch these servers (sometimes it is easy to explain, other times it is not so easy). You then have to negotiate an outage, and then you have to pray that everything goes well. Every once and a long while, it just doesn't go very well.

 

Sunday morning right after Midnight found me at my keyboard (yes, I used my monitor and mouse, too) patching away. It went fast and there were no problems. In fact, there was almost zero application downtime even though reboots were required for certain patches. No downtime (yeah, it was about 1 minute per application if that much) and reboots were required?  How is that you say? Well, if you read the title, it would make sense.

 

Each round of patching was simple. All passive nodes of server clusters were patched and rebooted. The applications were happily running without any interference. Half of the nodes for the NLB clusters were also patched and restarted as needed. The applications on the NLB clusters also continued to run happily and were unaware of the patching. The second round of patching was a little more complex. I moved all of the server clusters to their passive nodes and verified they were running properly, then I patched and restarted them as necessary. Yep, I had to work real hard and use the cluster administrator to move the cluster groups before I did the patching. The NLB clusters were easy. Once the first half came back online, then I patched and restarted the second half. *Whew* All patching completed in less than 15 minutes. Zero end user complaints.

 

Have I ever mentioned how much I love server and NLB clustering? If you haven't heard me before, let me say it now. I love MSCS and NLB clusters.

 

 

February 21 - NLB Unicast vs. Multicast – Russ Kaufmann

 

As usual, confusion motivates me to blog some more. In this case, I have blogged this because I was confused, and I am pretty sure that I have it straight now. Comments may prove me wrong.

 

When designing, planning, testing, and implementing Network Load Balancing (NLB) Clustering, a choice has to be made regarding unicast vs. multicast. There are a few differences, but the main difference is in the way MAC addresses are implemented.

 

Unicast - Each NLB cluster node replaces its real (hard coded) MAC address with a new one (generated by the NLB software) and each node in the NLB cluster uses the same (virtual) MAC. Because of this virtual MAC being used by multiple computers, a switch is not able to learn the port for the virtual NLB cluster MAC and is forced to send the packets destined for the NLB MAC to all ports of a switch to make sure packets get to the right destination.

 

So, basically, the way NLB traffic is handled is kind of like this:

 

1. An inbound packet for IP address w.x.y.z (NLB Virtual IP) arrives

2. The ARP request is generated and is sent across all ports of the switch since there is no mapping at this point

3. All of the NLB cluster nodes respond with the same MAC

4. The switch sends the traffic to all ports because it is not able to tell which is the proper port and this leads to switch flooding

 

If an NLB cluster node is using unicast, NLB isn't able to tell each node apart as they all have the same MAC. Since each NLB cluster node has the same MAC, communication between NLB cluster nodes is not possible unless each NLB cluster node has an additional NIC with a unique MAC.

 

Multicast - NLB adds a layer 2 MAC address to the NIC of each node. Each NLB cluster node basically has two MAC addresses, its real one and its NLB generated address. With multicast, you can create static entries in the switch so that it sends the packets only to members of the NLB cluster. Mapping the address to the ports being used by the NLB cluster stops all ports from being flooded. Only the mapped ports will receive the the packets for the NLB cluster instead of all ports in the switch. If you don't create the static entries, it will cause switch flooding just like in unicast.

 

Flooding Solutions:

1. Hook all NLB devices to a hub and then connect it to a port on the switch. Since all NLB nodes with the same MAC come through the same port, there is no switch port flooding.

2. Configure a VLAN for all NLB cluster nodes to contain all NLB cluster traffic to just the VLAN and not run it over the entire switch.

3. Use multicast and configure static mapping for the NLB cluster nodes in the switch so it only floods the mapped ports instead of the entire switch.

 

 

February 11 - My High Availability Definition – Russ Kaufmann

 

No, really, I am not trying to piss off all of my Microsoft friends. I just don't like the Microsoft definition of HA.

 

I gave a lot of thought before I decided to blog on this one because I know I am going against the established grain with my method of explaining HA. However, I am used to being spanked in public, so I can take some more spankings if needed. Actually, I think I like being spanked.

 

High Availability is the combination of well defined, planned, tested, and implemented processes, software, and fault tolerant hardware focused on supplying and maintaining application availability.

 

For Example: As a high level example, consider messaging in an organization.

 

BAD - A poor implementation of Exchange is usually slapped together by purchasing a server that the administrator feels is about the right size and installing Exchange Server 2003 on it. Messaging clients are installed on network connected desktops and profiles are created. The Exchange server might even be successfully configured to connect to the Internet. I have seen Exchange environments installed in organizations over a short business week and even over night in some cases. It is easy to do it fast and get it done, but lost of important details are missed.

 

GOOD - In an HA environment, the deployment is well designed. Administrators research organizational messaging requirements. Users are brought into discussions along with admins and managers. Messaging is considered as a possilble solution to many company ills. Research may go on for an extended period as consultants are brought in to help build a design and review the design of others. Vendors are brought in to discuss how their products (Antivirus and content management for example) are going to help keep the messaging environment available and not waste messaging resources processing spam and spreading viruses (or is that virii?). Potential 3rd party software is tested and approved after a large investment of administrator and end user time. Hardware is sized and evaluated based on performance requirements and expected loads. Hardware is also sized and tested for disaster recovery and to meet service level agreements for both performance and time to recovery in the case of a disaster. Hardware selected will often contain fault tolerant components such as redundant memory, drives, network connects, cooling fans, powersupplies, and so on. An HA environment will incorporate lots of design, planning, and testing. An HA environment will often, but not always, include additional features such as server clustering which decreases downtime by allowing for rolling upgrades and allowing a preplanned response to failures. A top-notch HA messaging environment will also consider the messaging client and its potential configurations that lead to increased availability for users. For example, Outlook 2003 offers a cache mode configuration allowing users to create new messages, respond to existing mail in their in-box, and manager their calendars (amongst many other tasks) without having to maintain a constant connection to the Exchange server. Cache mode allows users to continue working even though the Exchange server might be down. It also allows for more efficient use of bandwidth.

 

The Goal - Now this is where many people disagree. I consider the goal of all HA environments to really be continuous availability (CA) of applications and resources for employees. Doesn't everyone want email to always be available processing messaging traffic and helping the people in the organization collaborate? Of course that is what we want. We want applications and their entire environment to continue runing forever.

 

In my opinion, we strive for CA and we settle for HA.

 

"In information technology, high availability refers to a system or component that is continuously operational for a desirably long length of time. Availability can be measured relative to "100% operational" or "never failing." A widely-held but difficult-to-achieve standard of availability for a system or product is known as "five 9s" (99.999 percent) availability."

 

Source:

http://searchcio.techtarget.com/sDefinition/0,,sid19_gci761219,00.html

 

Obviously, "continuously operational" just isn't possible over extremely long periods of time. Hardware will always fail, it is just a matter of when. Software becomes obsolete over time, too. We all need to understand that HA includes not just the hardware and software solution, but it also includes the backup/restore solution, and it includes failover processing. Most HA experts will also add that a true HA environment includes a well documented development, test, and production migration process for any and all changes to be made in production environments. There is much to achieving HA, however, it simply comes down to application availability through well designed, planned, tested, and implemented processes, software, and hardware.

 

Another Example would be if you use NLB to provide application availability to your users over the Internet for your web based app. NLB helps keep the application available to your users. The same can be said for server clustering, however, you need to take into account the non-availability during the actual failover of your application in the event of hardware or software failures. Sometimes, failover is a matter of seconds, in other cases it can be several minutes. In all cases, a clustering solution will significantly drive down non-availability and increase the uptime of your application as run on your servers. Many experts state that, for any application or system to be highly available, the parts need to be designed around availability and the

individual parts need to be tested before being put into production. As an example, if you are using 3rd party products with your Exchange environment that have not been properly tested, you may find that they are a weak link that results in loss of availability. Implementing a cluster will not necessarily result in HA if there are problems with the software.

 

I could and maybe should ramble on some more, but I need to focus on some other things right now. To summarize this entire discussion:

 

HA is so much more than just slapping a couple of servers together in a cluster. Please keep in mind all of the details behind a top-notch HA environment.

 

 

February 10 - Heartbeat Network – Russ Kaufmann

 

This is pet peeve of mine. I have run into several instances where clusters have been installed and the heartbeat network is not properly configured. The heartbeat network is a private network shared just by the nodes of a cluster and is not accessible to other systems. It is not supposed to be routable at all. When it is built, you should select the option for "Internal cluster communications only (private network)" for the private heartbeat network. Selecting anything else can be a problem since this network is not routable and can't connect to other networks. Make sure that you use IP addresses for your heartbeat network that do not exist anywhere else in your network or on the Internet.

 

You basically have a few choices when it comes to building the private heartbeat network:

  1. Use a cross over cable between the nodes (only valid for two node clusters)
  2. Use a VLAN on a switch
  3. Use a dedicated switch
  4. Use a dedicated hub

 

I highly recommend using a dedicated hub over a switch for a couple of very good reasons.

 

Some other steps that you should take with the heartbeat network is configuring the network connection in the operating system to remove the "Client for Microsoft Networks" and the "File and Printer Sharing for Microsoft Networks" on the General tab for the connection properties. You should also go into the "Internet Protocol TCI/IP" properties and in the advanced properties on the DNS tab you should unselect "Register this connection's addresses in DNS" and on the WINS tab select the radio button for "Disable NetBIOS over TCP/IP" since the private network does not need them.

 

Even after configuring the private heartbeat network, don't forget to also configure the public network connections so that they are set for "All communications (mixed network)" so that the public network connections can be used to run the heartbeat if the private network fails for some reason.

 

 

February 8 - Clustering is not the solution all the time – Russ Kaufmann

 

Everyone repeat after me, "I will not waste money trying to cluster every freaking service offered by Microsoft servers."

 

I just feel the need to scream this out loud today. That there are often simpler and easier ways to provide redundancy for some services than using Microsoft server clusters. A couple of quick examples (please write these down) include:

 

You don't need multiple nodes and a SAN to provide redundancy all the time. Remember, your company has other needs than have 8 way SMP WINS clusters with 4 TB of shared storage. Please, save a little money and please use some common sense.

 

 

February 5 – Exchange Cluster – Is Active/Active worth it? - Russ Kaufmann

This is kind of a regurgitation of a couple of threads on the microsoft.public.exchange.clustering newsgroup. In the threads, there are questions regarding the whole Active/Active issue. Several people, including a couple of good friends and a couple of top-notch Microsofties pointed out the evils of Active/Active. To be clear, Microsoft supports A/A for Exchange but does not recommend it. Best practices are developed based on the experiences of Microsoft's internal usage (often referred to as eating their own dog food), the early deployment programs, and through trouble reports and the experiences of many customers as reported and tracked though PSS.

Over the years, I have explained to my students that Active/Passive is the best practice when it comes to clustering Exchange. Almost always a student will protest stating that their managers and others don't want a wasted node so they want to know why A/A is such a problem. I point out that the store.exe is well known for sucking up all the RAM it can get. So, if you have two servers (node1 and node2) both running store.exe and consuming a very large amount of RAM on each node, then you can expect problems with the failover of one resource hog to a node where another resource hog lives. According to all of the literature, the store.exe on the surviving node should give up enough memory for the store.exe on the failing node to exist along with it as both store.exe's will basically drain down (this is a real high level summary and not the term normally used, but I think it helps to understand what is happening) so they will both have smaller memory footprints and can coexist. In practice, this process is less than smooth. Another concern that is well documented is that if both Exchange Virtual Servers (EVSs) life on the same node, their stores and storage groups add together and apply to constraints in Exchange. For example, if EVS1 has three storage groups and EVS2 has three storage groups, when you combine them, they exceed the limits for Exchange (a max of 4 storage groups) and they both will not function on the same node.

Anyways, the issue in this discussion was around performance. With two active nodes, their memory, their CPUs, and their disk spindles should (according to some basic logic) provide better overall performance than one active node with the same resources. At first glance this makes a great deal of sense.

When you dig deeper, this common sense stops making sense. Wow, did I just type that? Try to follow me here (it should be easy, I am a pretty big guy).

According to Microsoft, if you use Exchange in a best practice configuration, you should manage the resource consumption so that you don't exceed 80% of CPU. This is for a single server. If you consider that two nodes are active in an A/A cluster, and since there is a need to failover to a single node, then in order to maintain best practice configurations each node should be only utilized up to 40% of CPU utilization. This is basic math in that 40+40=80. This is discussed in 815180 here http://support.microsoft.com/default.aspx?scid=kb;en-us;815180&product=exch2003. This article also discusses the limit of 1,900 concurrent users per node. This article, however, doesn't address the added scalability of multiple server backplanes, multiple fiber adapters, and multiple spindles. So, the argument then becomes, do you really get enough benefit out of the additional I/O provided with an A/A cluster while still strictly limiting CPU? I would probably go so far as to say yes, but only because it is very clear from all of the Exchange work that I have done that the disk I/O is the limiting factor for higher performance.

So to summarize the arguments/discussions:

Pro A/A

Pro A/P

There are other issues to consider, for example:

The reason I bring up this whole topic in this blog entry is that the A/A vs A/P issue really isn't as cut and dried as many of us would like to believe.

 

February 1 - MSDTC Issue Part 2 – Russ Kaufmann

 

I got brave and decided to fix the problem. The issue, for those who missed the first show, was that I needed to reconfigure the MSDTC on a SQL cluster. The MSDTC needed to be moved to a different drive because we are retiring one of the EMC frames.

 

The more I read on this the more complex it seemed. I read through Q 301600 and Q 294209, and reviewed several other sources. These articles made it sound like I was going to have to rip out DTC on each node and then rebuild it on each node and restart SQL on the cluster. I just refused to believe it was so complex.

 

The MSDTC resource was configured as part of the initial cluster group with the cluster name and Q: as dependencies. Previously, the quorum was moved from Q: to I:, but the old Q: could not be removed until the MSDTC was reconfigured.

 

So, the more that I thought about it, the more sense it made to me, so I:

  1. Stopped the MSDTC resource
  2. Copied the MSDTC folder from Q: to I:
  3. Stopped the Q: resource
  4. Deleted the MSDTC resource
  5. Created a new MSDTC resource with the clustername and the new I: as dependencies
  6. Brought the MSDTC resource online

 

I don't see any problems with it so far.