General Switch Troubleshooting Suggestions
There are many ways to troubleshoot a switch. As the features of switches grow, the possible things that can break also increase. If you develop an approach or test plan for troubleshooting, you are better off in the long run than if you just try a hit-and-miss approach. Here are some general suggestions to make your troubleshooting more effective:
- Take the time to become familiar with normal switch operation. The Cisco web site has a tremendous amount of technical information that describes how their switches work, as mentioned in the previous section. The configuration guides in particular are very helpful. Many cases are opened that are solved with information from the product configuration guides.
- For the more complex situations, have an accurate physical and logical map of your network. A physical map shows how the devices and cables are connected. A logical map shows what segments (VLANs) exist in your network and which routers provide routing services to these segments. A spanning tree map is highly useful to troubleshoot complex issues. Because of the ability of a switch to create different segments with the implementation of VLANs, the physical connections alone do not tell the whole story; one has to know how the switches are configured to determine which segments (VLANs) exist and to know how they are logically connected.
- Have a plan. Some problems and solutions are obvious; some are not. The symptoms that you see in your network can be the result of problems in another area or layer. Before you jump to conclusions, try to verify in a structured way what works and what does not. Since networks can be complex, it is helpful to isolate possible problem domains. One way to do this is to use the OSI seven-layer model. For example: check the physical connections involved (layer 1); check connectivity issues within the VLAN (layer 2), and check connectivity issues across different VLANs (layer 3), etc. If there is a correct configuration on the switch, many of the problems you encounter are related to physical layer issues (physical ports and cabling). Today, switches are involved in layer-three and four issues, which incorporate intelligence to switch packets based on information derived from routers, or actually have routers that live inside the switch (layer-three or layer-four switching).
- Do not assume a component works without checking it first. This can save you a lot of wasted time. For example, if a PC is not able to log in to a server across your network, there are many things that can be wrong. Do not skip the basic things and assume that something works; someone can have changed something and not told you. It only takes a minute to check some of the basic things (for example, that the ports involved are connected to the right place and are active), which could save you many wasted hours.
Troubleshooting Port Connectivity Problems
If the port does not work, nothing works! Ports are the foundation of your switching network. Some ports have special significance because of their location in the network and the amount of traffic they carry. These ports include connections to other switches, routers, and servers. These ports can be more complicated to troubleshoot because they often take advantage of special features like trunking and EtherChannel. The rest of the ports are significant, as well, because they connect the actual users of the network.
Many things can cause a port to be non-functional: hardware issues, configuration issues, and traffic issues. These categories are explored a little deeper.
Hardware Issues
General
Port functionality requires two working ports connected by a working cable (of the correct type). The default of most Cisco switches is to have a port in notconnectstate, which means that it is currently not connected to anything but it wants to connect. If you connect a good cable to two switch ports in the notconnect state, the link light becomes green for both ports, and the port status says connected, which means the port is up as far as layer one is concerned. These paragraphs point out items for which to check if layer one is not up.
Check the port status for both ports involved. Make sure that neither port involved in the link is shutdown. The administrator possibly can have shut down one or both ports. Software inside the switch can have shut the port down because of configuration error conditions (we will expand on this later). If one side is shutdown and the other is not, the status on the enabled side is notconnect(because it does not sense a neighbor on the other side of the wire). The status on the shutdown side says something like disable orerrDisable (dependent upon what actually shut the port down). The link does not come up unless both ports are enabled.
When you hook up a good cable (again, if it is of the correct type) between two enabled ports they show a green link light within a few seconds. Also, the port state shows connected in the command line interface (CLI). At this point, if you do not have link, your problem is limited to three things: the port on one side, the port on the other side, or the cable in the middle. In some cases, there are other devices involved: media converters (fiber to copper, etc.), or on Gigabit links you can have gigabit interface connectors (GBICs). Still, this is a reasonably limited area to search.
Media converters can add noise to a connection or weaken the signal if they do not function correctly. They also add extra connectors that can cause problems and are another component to debug.
Check for loose connections. Sometimes a cable appears to be seated in the jack, but it actually is not; unplug the cable and re-insert it. You must also look for dirt or broken or missing pins. Do this for both ports involved in the connection.
The cable can be plugged in to the wrong port, which commonly happens. Make sure both ends of the cable are plugged in to the ports where you really want them.
You can have link on one side and not on the other. Check both sides for link. A single broken wire can cause this type of problem.
A link light does not guarantee that the cable is fully functional. It can have encountered physical stress that causes it to be functional at a marginal level. Usually you notice this by the port that has lots of packet errors.
In order to determine if the cable is the problem, swap it with a known good cable. Do not just swap it with any other cable; make sure that you swap it with a cable that you know is good and is of the correct type.
If this is a very long cable run (underground, across a large campus, for example), it is nice to have a sophisticated cable tester. If you do not have a cable tester, you can consider these:
- Try different ports to see if they come up with this long cable.
- Connect the port in question to another port in the same switch just to see if the port links up locally.
- Temporarily relocate the switches near each other, so you can try out a known good cable.
Copper
Make sure that you have the correct cable for the type of connection that you make. Category 3 cable can be used for 10MB UTP connections, but category 5 must be used for 10/100 connections.
A straight-through RJ-45 cable is used for end-stations, routers, or servers to connect to a switch or hub. An Ethernet crossover cable is used for switch to switch or hub to switch connections. This is the pin-out for an Ethernet crossover cable. Maximum distances for Ethernet or Fast Ethernet copper wires are 100 meters. A good general rule of thumb is that when you cross an OSI layer, as between a switch and a router, use a straight-through cable; when you connect two devices in the same OSI layer, as between two routers or two switches, use a cross over cable. For purposes of this rule only, treat a workstation like a router.
These two graphics show the pin-outs required for a switch-to-switch crossover cable.
Fiber
For fiber, make sure that you have the correct cable for the distances involved and the type of fiber ports that is used (single mode, multi mode). Make sure the ports that are connected together are both single mode or both multimode ports. Single mode fiber generally reaches 10 kilometers, and multimode fiber can usually reach 2 kilometers, but there is the special case of 100BaseFX multimode used in half duplex mode, which can only go 400 meters.
For fiber connections, make sure the transmit lead of one port is connected to the receive lead of the other port, and vice versa; transmit to transmit, receive to receive, does not work.
For gigabit connections, GBICs need to be matched on each side of the connection. There are different types of GBICs dependent upon the cable and distances involved: Short wavelength (SX), long wavelength/long haul (LX/LH), and extended distance (ZX).
An SX GBIC needs to connect with an SX GBIC; an SX GBIC does not link with an LX GBIC. Also, some gigabit connections require conditioning cables dependent upon the lengths involved. Refer to the GBIC installation notes.
If your gigabit link does not come up, check to make sure the flow control and port negotiation settings are consistent on both sides of the link. There can be incompatibilities in the implementation of these features if the switches that are connected are from different vendors. If in doubt, turn these features off on both switches.
Configuration Issues
Another cause of port connectivity issues is incorrect software configuration of the switch. If a port has a solid orange light, that means that software inside the switch shut down the port, either by way of the user interface or by internal processes.
Make sure that the administrator has not shut down the ports involved (as mentioned). The administrator can have manually shut down the port on one side of the link or the other. This link does not come up until you re-enable the port; check the port status.
Some switches, such as the Catalyst 4000/5000/6000, can shut down the port if software processes inside the switch detect an error. When you look at the port status, it reads errDisable. You must fix the configuration problem and then manually take the port out of errDisable state. Some newer software versions (CatOS 5.4(1) and later) have the ability to automatically re-enable a port after a configurable amount of time spent in the errDisable state. These are some of the causes for this errDisable state:
- EtherChannel Misconfiguration: If one side is configured for EtherChannel and the other is not, it can cause the spanning tree process to shut down the port on the side configured for EtherChannel. If you try to configure EtherChannel but the ports involved do not have the same settings (speed, duplex, trunking mode, etc.) as their neighbor ports across the link, it could cause the errDisable state. It is best to set each side for the EtherChannel desirable mode if you want to use EtherChannel. Sections later on talk in depth about how to configure the EtherChannel.
- Duplex Mismatch: If the switch port receives a lot of late collisions, this usually indicates a duplex mismatch problem. There are other causes for late collisions: a bad NIC, cable segments that are too long, but the most common reason today is a duplex mismatch. The full duplex side thinks it can send whenever it wants to. The half duplex side only expects packets at certain times - not at "any" time.
- BPDU Port-guard: Some newer versions of switch software can monitor if portfast is enabled on a port. A port that uses portfast must be connected to an end-station, not to devices that generate spanning tree packets called BPDUs. If the switch notices a BPDU that comes in a port that has portfast enabled, it puts the port in errDisable mode.
- UDLD: Unidirectional Link Detection is a protocol on some new versions of software that discovers if communication over a link is one-way only. A broken fiber cable or other cabling/port issues can cause this one-way only communication. These partially functional links can cause problems when the switches involved do not know that link is partially broken. Spanning tree loops can occur with this problem. UDLD can be configured to put a port in errDisable state when it detects a unidirectional link.
- Native VLAN mismatch: Before a port has trunking turned on, it belongs to a single VLAN. When trunking is turned on, the port can carry traffic for many VLANs. The port still remembers the VLAN it was in before trunking was turned on, which is called the native VLAN. The native VLAN is central to 802.1q trunking. If the native VLAN on each end of the link does not match, a port goes into the errDisable state.
- Other: Any process within the switch that recognizes a problem with the port can place it in the errDisable state.
Another cause of inactive ports is when the VLAN they belong to disappears. Each port in a switch belongs to a VLAN. If that VLAN is deleted, the port becomes inactive. Some switches show a steady orange light on each port where this has happened. If you come in to work one day and see hundreds of orange lights, do not panic; it could be that all the ports belonged to the same VLAN and someone accidentally deleted the VLAN that the ports belonged to. When you add the VLAN back into the VLAN table, the ports become active again. A port remembers its assigned VLAN.
If you have link and the ports show connected, but you cannot communicate with another device, this can be particularly perplexing. It usually indicates a problem above the physical layer: layer 2 or layer 3. Try these things.
- Check the trunking mode on each side of the link. Make sure both sides are in the same mode. If you turn the trunking mode to "on" (as opposed to "auto" or "desirable") for one port, and the other port has the trunking mode set to "off", they are not able to communicate. Trunking changes the formatting of the packet; the ports need to be in agreement as to what format they use on the link or they do not understand each other.
- Make sure all devices are in the same VLAN. If they are not in the same VLAN, a router must be configured to allow the devices to communicate.
- Make sure your layer three addressing is correctly configured.
Traffic Issues
In this section, we describe some of the things you can learn when you look at that traffic information of a port. Most switches have some way to track the packets going in and out of a port. Commands that generate this type of output on the Catalyst 4000/5000/6000 switches are show port and show mac. Output from these commands on the 4000/5000/6000 switches is described in the switch command references.
Some of these port traffic fields show how much data is transmitted and received on the port. Other fields show how many error frames are encountered on the port. If you have a large amount of alignment errors, FCS errors, or late collisions, this can indicate a duplex mismatch on the wire. Other causes for these types of errors can be bad network interface cards or cable problems. If you have a large number of deferred frames, it is a sign that your segment has too much traffic; the switch is not able to send enough traffic on the wire to empty its buffers. Consider the removal of some devices to another segment.
Switch Hardware Failure
If you have tried everything you can think of and the port does not work, there might be faulty hardware.
Sometimes ports are damaged by Electro-Static Discharge (ESD). You can or cannot see any indication of this.
Look at the power-on self-test (POST) results from the switch to see if there were any failures indicated for any part of the switch.
If you see behavior that can only be considered "strange," this could indicate hardware problems, but it could also indicate software problems. It is usually easier to reload the software than it is to get new hardware. Try to work with the switch software first.
The operating system can have a bug. If you load a newer operating system, it could fix this. You can research known bugs if you read the release notes for the version of code you use or use Cisco Bug ToolKit.
The operating system could have somehow become corrupted. If you reload the same version of the operating system, you could fix the problem.
If the status light on the switch flashes orange, this usually means there is some kind of hardware problem with the port or the module or the switch. The same thing is true if the port or module status indicates faulty.
Before you exchange the switch hardware, you can try a few things:
- Reseat the module in the switch. If you do this with the power on, make sure the module is hot swappable. If in doubt, turn the switch off before you reseat the module or refer to the hardware installation guide. If the port is built in to the switch, ignore this step.
- Reboot the switch. Sometimes this causes the problem to disappear; this is a workaround, not a fix.
- Check the switch software. If this is a new installation, remember that some components can only work with certain releases of software. Check the release notes or the hardware installation and configuration guide for the component you install.
- If you are reasonably certain that you have a hardware problem, replace the faulty component.
No comments:
Post a Comment