Understanding Quorum Configurations in a Failover Cluster – TechNet

Understanding Quorum Configurations in a Failover Cluster

Applies To: Windows Server 2008 R2

This topic contains the following sections:

For information about how to configure quorum options, see Select Quorum Options for a Failover Cluster.

How the quorum configuration affects the cluster

The quorum configuration in a failover cluster determines the number of failures that the cluster can sustain. If an additional failure occurs, the cluster must stop running. The relevant failures in this context are failures of nodes or, in some cases, of a disk witness (which contains a copy of the cluster configuration) or file share witness. It is essential that the cluster stop running if too many failures occur or if there is a problem with communication between the cluster nodes. For a more detailed explanation, see Why quorum is necessarylater in this topic.

ImportantImportant
In most situations, use the quorum configuration that the cluster software identifies as appropriate for your cluster. Change the quorum configuration only if you have determined that the change is appropriate for your cluster.

Note that full function of a cluster depends not just on quorum, but on the capacity of each node to support the services and applications that fail over to that node. For example, a cluster that has five nodes could still have quorum after two nodes fail, but the level of service provided by each remaining cluster node would depend on the capacity of that node to support the services and applications that failed over to it.

Quorum configuration choices

You can choose from among four possible quorum configurations:

  • Node Majority (recommended for clusters with an odd number of nodes)

    Can sustain failures of half the nodes (rounding up) minus one. For example, a seven node cluster can sustain three node failures.

  • Node and Disk Majority (recommended for clusters with an even number of nodes)

    Can sustain failures of half the nodes (rounding up) if the disk witness remains online. For example, a six node cluster in which the disk witness is online could sustain three node failures.

    Can sustain failures of half the nodes (rounding up) minus one if the disk witness goes offline or fails. For example, a six node cluster with a failed disk witness could sustain two (3-1=2) node failures.

  • Node and File Share Majority (for clusters with special configurations)

    Works in a similar way to Node and Disk Majority, but instead of a disk witness, this cluster uses a file share witness.

    Note that if you use Node and File Share Majority, at least one of the available cluster nodes must contain a current copy of the cluster configuration before you can start the cluster. Otherwise, you must force the starting of the cluster through a particular node. For more information, see “Additional considerations” in Start or Stop the Cluster Service on a Cluster Node.

  • No Majority: Disk Only (not recommended)

    Can sustain failures of all nodes except one (if the disk is online). However, this configuration is not recommended because the disk might be a single point of failure.

Illustrations of quorum configurations

The following illustrations show how three of the quorum configurations work. A fourth configuration is described in words, because it is similar to the Node and Disk Majority configuration illustration.

noteNote
In the illustrations, for all configurations other than Disk Only, notice whether a majority of the relevant elements are in communication (regardless of the number of elements). When they are, the cluster continues to function. When they are not, the cluster stops functioning.

Cluster with Node Majority quorum configurationAs shown in the preceding illustration, in a cluster with the Node Majority configuration, only nodes are counted when calculating a majority.

Cluster with Node and Disk Majority quorumAs shown in the preceding illustration, in a cluster with the Node and Disk Majority configuration, the nodes and the disk witness are counted when calculating a majority.

Node and File Share Majority Quorum Configuration

In a cluster with the Node and File Share Majority configuration, the nodes and the file share witness are counted when calculating a majority. This is similar to the Node and Disk Majority quorum configuration shown in the previous illustration, except that the witness is a file share that all nodes in the cluster can access instead of a disk in cluster storage.

Cluster with Disk Only quorum configurationIn a cluster with the Disk Only configuration (No Majority: Disk Only), the number of nodes does not affect how quorum is achieved. The disk is the quorum. However, if communication with the disk is lost, the cluster becomes unavailable.

Why quorum is necessary

When network problems occur, they can interfere with communication between cluster nodes. A small set of nodes might be able to communicate together across a functioning part of a network but not be able to communicate with a different set of nodes in another part of the network. This can cause serious issues. In this “split” situation, at least one of the sets of nodes must stop running as a cluster.

To prevent the issues that are caused by a split in the cluster, the cluster software requires that any set of nodes running as a cluster must use a voting algorithm to determine whether, at a given time, that set has quorum. Because a given cluster has a specific set of nodes and a specific quorum configuration, the cluster will know how many “votes” constitutes a majority (that is, a quorum). If the number drops below the majority, the cluster stops running. Nodes will still listen for the presence of other nodes, in case another node appears again on the network, but the nodes will not begin to function as a cluster until the quorum exists again.

For example, in a five node cluster that is using a node majority, consider what happens if nodes 1, 2, and 3 can communicate with each other but not with nodes 4 and 5. Nodes 1, 2, and 3 constitute a majority, and they continue running as a cluster. Nodes 4 and 5, being a minority, stop running as a cluster. If node 3 loses communication with other nodes, all nodes stop running as a cluster. However, all functioning nodes will continue to listen for communication, so that when the network begins working again, the cluster can form and begin to run.

Additional references

Advertisements

Select Quorum Options for a Failover Cluster – TechNet

Select Quorum Options for a Failover Cluster

Applies To: Windows Server 2008 R2

If you have special requirements or make changes to your cluster, you might want to change the quorum options for your cluster.

ImportantImportant
In most situations, use the quorum configuration that the cluster software identifies as appropriate for your cluster. Change the quorum configuration only if you have determined that the change is appropriate for your cluster.

 

 

For important conceptual information about quorum configuration options, see Understanding Quorum Configurations in a Failover Cluster.

Membership in the local Administrators group on each clustered server, or equivalent, is the minimum required to complete this procedure. Also, the account you use must be a domain account. Review details about using the appropriate accounts and group memberships at Local and Domain Default Groups.

To select quorum options for a cluster

  1. In the Failover Cluster Manager snap-in, if the cluster that you want to configure is not displayed, in the console tree, right-clickFailover Cluster Manager, click Manage a Cluster, and then select or specify the cluster that you want.

  2. With the cluster selected, in the Actions pane, click More Actions, and then click Configure Cluster Quorum Settings.

  3. Follow the instructions in the wizard to select the quorum configuration for your cluster. If you choose a configuration that includes a disk witness or file share witness, follow the instructions for specifying the witness.

  4. After the wizard runs and the Summary page appears, if you want to view a report of the tasks that the wizard performed, clickView Report.

Additional considerations

  • To open the failover cluster snap-in, click Start, click Administrative Tools, and then click Failover Cluster Manager. If the User Account Control dialog box appears, confirm that the action it displays is what you want, and then click Yes.

Additional references

Understanding Quorum in a Failover Cluster – Clustering and High-Availability

Understanding Quorum in a Failover Cluster – Clustering and High-Availability – MSDN Blogs

 

Understanding Quorum in a Failover Cluster

Hi Cluster Fans,

This blog post will clarify planning considerations around quorum in a Failover Cluster and answer some of the most common questions we hear.

The quorum configuration in a failover cluster determines the number of failures that the cluster can sustain while still remaining online.  If an additional failure occurs beyond this threshold, the cluster will stop running.  A common perception is that the reason why the cluster will stop running if too many failures occur is to prevent the remaining nodes from taking on too many workloads and having the hosts be overcommitted.  In fact, the cluster does not know your capacity limitations or whether you would be willing to take a performance hit in order to keep it online.  Rather quorum is design to handle the scenario when there is a problem with communication between sets of cluster nodes, so that two servers do not try to simultaneously host a resource group and write to the same disk at the same time.  This is known as a “split brain” and we want to prevent this to avoid any potential corruption to a disk my having two simultaneous group owners.  By having this concept of quorum, the cluster will force the cluster service to stop in one of the subsets of nodes to ensure that there is only one true owner of a particular resource group.  Once nodes which have been stopped can once again communicate with the main group of nodes, they will automatically rejoin the cluster and start their cluster service.

For more information about quorum in a cluster, visit: http://technet.microsoft.com/en-us/library/cc731739.aspx.

Voting Towards Quorum

Having ‘quorum’, or a majority of voters, is based on voting algorithm where more than half of the voters must be online and able to communicate with each other.  Because a given cluster has a specific set of nodes and a specific quorum configuration, the cluster will know how many “votes” constitutes a majority of votes, or quorum.  If the number of voters drop below the majority, the cluster service will stop on the nodes in that group.  These nodes will still listen for the presence of other nodes, in case another node appears again on the network, but the nodes will not begin to function as a cluster until the quorum exists again.

It is important to realize that the cluster requires more than half of the total votes to achieve quorum.  This is to avoid having a ‘tie’ in the number of votes in a partition, since majority will always mean that the other partition has less than half the votes.  In a 5-node cluster, 3 voters must be online; yet in a 4-node cluster, 3 voters must also be online to have majority.  Because of this logic, it is recommended to always have an odd number of total voters in the cluster.  This does not necessarily mean an odd number of nodes is needed since both a disk or a file share can contribute a vote, depending on the quorum model.

A voter can be:

  • A node
    • 1 Vote
    • Every node in the cluster has 1 vote
  • A “Disk Witness” or “File Share Witness”
    • 1 Vote
    • Either 1 Disk Witness or 1 File Share Witness may have a vote in the cluster, but not multiple disks, multiple file shares nor any combination of the two

Quorum Types

There are four quorum types.  This information is also available here: http://technet.microsoft.com/en-us/library/cc731739.aspx#BKMK_choices.

Node Majority

This is the easiest quorum type to understand and is recommended for clusters with an odd number of nodes (3-nodes, 5-nodes, etc.).  In this configuration, every node has 1 vote, so there is an odd number of total votes in the cluster.  If there is a partition between two subsets of nodes, the subset with more than half the nodes will maintain quorum.  For example, if a 5-node cluster partitions into a 3-node subset and a 2-node subset, the 3-node subset will stay online and the 2-node subset will offline until it can reconnect with the other 3 nodes.

Node & Disk Majority

This quorum configuration is most commonly used since it works well with 2-node and 4-node clusters which are the most common deployments.  This configuration is used when there is an even number of nodes in the cluster.  In this configuration, every node gets 1 vote, and additionally 1 disk gets 1 vote, so there is generally an odd number of total votes.

This disk is called the Disk Witness (sometimes referred to as the ‘quorum disk’) and is simply a small clustered disk which is in the Cluster Available Storage group.  This disk is highly-available and can failover between nodes.  It is considered part of the Cluster Core Resources group, however it is generally hidden from view in Failover Cluster Manager since it does not need to be interacted with.

Since there are an even number of nodes and 1 addition Disk Witness vote, in total there will be an odd number of votes.  If there is a partition between two subsets of nodes, the subset with more than half the votes will maintain quorum.  For example, if a 4-node cluster with a Disk Witness partitions into a 2-node subset and another 2-node subset, one of those subsets will also own the Disk Witness, so it will have 3 total votes and will stay online.  The 2-node subset will offline until it can reconnect with the other 3 voters.  This means that the cluster can lose communication with any two voters, whether they are 2 nodes, or 1 node and the Witness Disk.

Node & File Share Majority

This quorum configuration is usually used in multi-site clusters.  This configuration is used when there is an even number of nodes in the cluster, so it can be used interchangeably with the Node and Disk Majority quorum mode.  In this configuration every node gets 1 vote, and additionally 1 remote file share gets 1 vote.

This file share is called the File Share Witness (FSW) and is simply a file share on any server in the same AD Forest which all the cluster nodes have access to.  One node in the cluster will place a lock on the file share to consider it the ‘owner’ of that file share, and another node will grab the lock if the original owning node fails.  On a standalone server, the file share by itself is not highly-available, however the file share can also put on a clustered file share on an independent cluster, making the FSW clustered and giving it the ability to fail over between nodes.  It is important that you do not put this vote on a node in the same cluster, nor within a VM on the same cluster, because losing that node would cause you to lose the FSW vote, causing two votes to be lost on a single failure.  A single file server can host multiple FSWs for multiple clusters.

Generally multi-site clusters have two sites with an equal number of nodes at each site, giving an even number of nodes.  By adding this additional vote at a 3rd site, there is an odd number of votes in the cluster, at very little expense compared to deploying a 3rd site with an active cluster node and writable DC.  This means that either site or the FSW can be lost and the cluster can still maintain quorum.  For example, in a multi-site cluster with 2 nodes at Site1, 2 nodes at Site2 and a FSW at Site3, there are 5 total votes.  If there is a partition between the sites, one of the nodes at a site will own the lock to the FSW, so that site will have 3 total votes and will stay online.  The 2-node site will offline until it can reconnect with the other 3 voters.

Legacy: Disk Only

Important: This quorum type is not recommended as it has a single point of failure.

The Disk Only quorum type was available in Windows Server 2003 and has been maintained for compatibility reasons, however it is strongly recommended to never use this mode unless directed by a storage vender.  In this mode, only the Disk Witness contains a vote and there are no other voters in the cluster.  This means that if the disk becomes unavailable, the entire cluster will offline, so this is considered a single point of failure.  However some customers choose to deploy this configuration to get a “last man standing” configuration where the cluster remain online, so long as any one node is still operational and can access the cluster disk.  However, with this deployment objective, it is important to consider whether that last remaining node can even handle the capacity of all the workloads that have moved to it from other nodes.

Default Quorum Selection

When the cluster is created using Failover Cluster Manager, Cluster.exe or PowerShell, the cluster will automatically select the best quorum type for you to simplify the deployment.  This choice is based on the number of nodes and available storage.  The logic is as follows:

  • Odd Number of Nodes – use Node Majority
    • Even Number of Nodes
      • Available Cluster Disks – use Node & Disk Majority
      • No Available Cluster Disk – use Node Majority

The cluster will never select Node and File Share Majority or Legacy: Disk Only.  The quorum type is still fully configurable by the admin if the default selections are not preferred.

Changing Quorum Types

Changing the quorum type is easy through Failover Cluster Manager.  Right-click on the name of the cluster, select More Actions…, then select Configure Cluster Quorum Settings… to launch the Configure Cluster Quorum Wizard.  From the wizard it is possible to configure all 4 quorum types, change the Disk Witness or File Share Witness.  The wizard will even tell you the number of failures that can be sustained based on your configuration.

For a step-by-step guide of configuring quorum, visit: http://technet.microsoft.com/en-us/library/cc733130.aspx.

Thanks!
Symon Perriman
Technical Evangelist
Private Cloud Technologies
Microsoft

  • #

    Hi, Can u shed some light on the new switch to start the Cluster /PQ and the nodeweight concept as put on KB 2494036

    Thanks

  • #

    Hi, I would like to know how to migrate the Quorum disk? Meanwhile, downtime is essential for migration?

  • #

    Hi Can i configure 2 FSWs in a cluster (from 3rd and 4th location). This is not to increase the votes but just to have high availability at share level. if the connection to the site3 lost site4 FSW provide a vote and if the connection to the site4 lost site3 FSW will provide the vote.

  • #

    You can only configure 1 FSW per Cluster.

    Please look at the ‘Node & File Share Majority’ above to understand how the Quorum Votes are calculated.

    If you have your Cluster nodes up and running and you loose connectivity to File Share Witness (3rd site) then the cluster would continue to run provided you have enough number of Cluster Nodes up and running.

    Thanks,

    Amitabh

  • #

    None of the MS documentation makes this clear to me. If I have a two-node sql cluster and it’s set as node&disk, then if the disk goes offline, the cluster stays up because the nodes are still running? how is this supposed to work? if the disk is offline then most likely your data drive is too, in which case sql isn’t going to run well. came somebody clear this up for me?

  • #

    The quorum disk is a witness disk with hold extra copy of clus DB. This help the cluster availabbility. Normally this disk is not used for other purpose.  It doesn’t mean the other disks will be offline /failed if if the quorum disk is failed. What/how many  disks  is in your cluster?

–END–