cancel
Showing results for 
Search instead for 
Did you mean: 

Hub status data - understanding network log messages

Andrew-G
Alessandro Volta

N Warning: Long, dull, a bit technical. N

Having needed to repeat myself a few times, I thought I'd share my understanding of what you might find in a Hub 3's network log when things go wrong, and what it means.  First of all, this post is based on my own reading around DOCSIS cable technology, all is public domain, and is not information supplied or endorsed by Virgin Media.  Whilst I have some experience working on computer systems, I am only an interested amateur here, I have no technical training in this area, there are considerable simplifications I have chosen to make, and the possibility of error where I may simply be wrong.  If you do know better, feel free to correct me, but then you could have written all of this first.  Most of what follows apples to the Hub 4 and earlier Superhubs, but may have slightly different terminology.

In understanding what follows, you need to know that your hub communicates by converting all digital communication up or down to/from an analogue radio frequency (RF) signal which it communicates via 24 downstream channels and 4 upstream channels through the coax cable to a Cable Modem Termination System (CMTS) that sits at the top of the local coax network.  The CMTS converts all the mixed RF analogue signals from the hundreds of customer hubs connected to it back to digital, and pipes this back to VM's internet exit points.  In simple terms, your hub is a modem, and the CMTS is a stuffing great multi-user modem, and the hub uses its upstream connection to keep all 28 channels synchronised with the CMTS, at the right frequency and power levels.  When your hub starts up, it goes through a process of registration that starts off by looking for a downstream sync signal, then establishing a primary upstream channel, and working out from there which frequencies it can use to connect all 28 required channels, in amongst the signals from hundreds of other hubs on the same cable segment, and several hundred TV channels, and it also needs to establish internet protocol communication with the CMTS and secure an IP address.

The hub’s network log is monitoring the broadband connection between the hub’s built in cable modem and the CMTS, it has nothing to do with wifi, which is the wireless network in your house between the hub and your own devices.  If you're seeing serious and continuing network log errors, then fiddling with wifi settings, or even buying your own router will not help, and complaining to VM about "wifi not working" will result in people assuming that you do mean the wireless network in your house.

On a Hub 3's network log, there will always be a range of error or info messages, often with technical and unhelpful descriptions.  The vast majority of DHCP or LAN Login messages and errors can usually be ignored.  Some others are self-explanatory, such as messages loosely along the lines "rebooted at user software command", or "downloaded firmware update".   

Due to the analogue technology that is used by cable, all cable modem connections tend to have some radio frequency noise, and will accumulate errors every so often, and infrequent errors can usually be ignored.  So the first thing in judging if there’s a problem is repeated errors with some degree of frequency.  In addition to how often these messages turn up, the severity of a problem can sometimes be indicated by conjunction between different error messages, plus evidence from other indicators including power levels, modulation levels, SNR data and upstream timeout counts, as well as external fault checkers like a Thinkbroadband BQM.  It is also important to observe the time stamps on error messages - a cluster of perhaps 5-8 serious errors within a ten minute period is probably down to a single network event, the same errors occurring spread across the day indicates a continuing or repeated problem.  Note as well these can be "cascade errors", where the hub initially has a timeout on ranging (see below), can't recover that, then sees one or more channels lost, and eventually ends up needing a full re-registration, creating a string of logged errors that in practice refer to a single event.  A few errors when starting up the hub are usually no cause for concern, that's hopefully just the hub rubbing the sleep out of its eyes, and likewise errors caused by electrical disturbance such as mains surges or brownouts will often create errors that are not a cable connection fault.  

In terms of the error messages you might see, I'll pick out the ones that are moderately common and serious, there are others which are not material, or not featuring here. 

SYNC Timing Synchronization failure: Broadly speaking the most serious connection failure, where the hub cannot detect the synchronisation data that is transmitted by the CMTS every 20 milliseconds.  As a result the hub loses synchronisation with the CMTS, usually all internet connectivity is lost, and the hub has to renegotiate all channels from scratch, taking about 7 minutes to do this.  This usually takes as long as, and to a user looks like the modem has spontaneously rebooted.

RCS Partial Service: The next most serious common error.  This means that the hub is still getting sync signals from the CMTS but has lost communication on one or more of its 28 channels, although the remainder are reporting OK.  The hub needs to renegotiate those channels that have dropped.  The hub will look as though all is OK, casual internet browsing may be largely unaffected, but any continuous connection internet activity will be dropped out, causing speed or latency problems.  Usually the hub can renegotiate the dropped channels, but if it cannot it may revert to a full re-registration with the CMTS, in which case it will again look like a reboot and take 7 minutes.

No Ranging Response received - T3 time-out: (and related but less common T2 and T4 timeouts).  T3 timeouts are more common than RCS Partial Service, often similar in effect, but be aware most cable connections will see the odd T3 timeout once every few days, and if infrequent they are no cause for concern.  A T3 error means the hub has not had a response from the CMTS to its upstream ranging requests that keep everything working at the right frequency and power, often due to upstream RF noise.  Of itself a dropped ranging response doesn’t always mean a loss of connection, but if power or frequencies drift then there may be a momentary drop out. Normal response of the hub to “no ranging response” is often an attempt to regain communication by increasing power levels, but if that doesn’t work, then the hub may drop all channels and try to re-register with the CMTS, creating the symptoms seen with RCS Partial Service.  As a generalisation T4 timeouts alone are often indicative of downstream problems (although if it is a one off it wouldn't be material), but it’s fairly rare to see multiple T4s.  T3 timeouts usually indicate upstream issues, and may be associated with channel loss, dropped modulation or power problems on the 4 upstream channels.

Lost MDD Timeout: Refers to loss of system messaging between hub and CMTS, despite how serious that sounds, MDD timeouts are usually of no significance to the user, but sometimes crops up with the other more serious errors.

Incorrect time stamp: It is not unusual to see 1970 dates (or ToD error reports) in the log of a hub that’s struggling.  Of themselves an incorrect date doesn’t cause the user problems, but what it shows that during registration with the CMTS, the hub isn’t getting the correct network time and date, which is usually another indicator of noise problems.

And finally, don’t waste your life monitoring your network log.  If you have problems with speed, reliability or latency, that’s when you start looking at the network log. 

17 REPLIES 17

risc19
Well-informed

This needs to be pinned.

Very nice mate.

katspike
On our wavelength

Thank you for posting this. Why isn't this info more prominent? So many people are posting related problems on the forums - often with no apparent remedy,

It's taken me ages to stumble upon this invaluable info. This should be pinned and published in the help section of the website!

 

Adduxi
Very Insightful Person
Very Insightful Person

Every section of the Help forums have "red stickies" at the top.   Despite this I guess 99% of people never read them before posting.  It's the same constant churn of requests for help, of which many could be sorted by a bit of reading these stickies.

However, on the upside, it gives some of us something to so by way of helping out  🙂

I'm a Very Insightful Person, I'm here to share knowledge, I don't work for Virgin Media. Learn more

Have I helped? Click Mark as Helpful Answer or use Kudos to say thanks

@Andrew G - Thanks for that breakdown, very helpful. Got a better understanding, 

Question - When "you guys" are helping people you ask for the stats on power/db  etc (as below) - what are you looking for here?  What's good and bad look like?

 

.0 Downstream channels

Channel Frequency (Hz) Power (dBmV) SNR (dB) Modulation Channel ID

253307500001.539QAM25625
61787500002.339QAM2566
71867500002.540.4QAM2567
81947500002.440.4QAM2568
92027500002.340.4QAM2569
102107500002.140.4QAM25610
112187500001.940.9QAM25611
122267500001.839QAM25612
13234750000239QAM25613
142427500002.140.4QAM25614
152507500001.939QAM25615
162587500001.339QAM25616
172667500001.339QAM25617
182747500000.938.6QAM25618
192827500000.538.6QAM25619
202907500001.239QAM25620
212987500001.839QAM25621
223067500002.339QAM25622
233147500002.139QAM25623
243227500001.638.6QAM25624
263387500001.739QAM25626
273467500001.839QAM25627
283547500001.839QAM25628
293627500001.838.6QAM25629
30370750000239QAM25630
31378750000239QAM25631
323867500002.139QAM25632
333947500001.839QAM25633
344027500001.639QAM25634
354107500001.739QAM25635
364187500001.739QAM25636

Adduxi
Very Insightful Person
Very Insightful Person

@bryansmithsnr wrote:

@Andrew G - Thanks for that breakdown, very helpful. Got a better understanding, 

Question - When "you guys" are helping people you ask for the stats on power/db  etc (as below) - what are you looking for here?  What's good and bad look like?

<snip>

Very generally, Downstream power should be within the range of -6 to +10, Upstream power between 34 and 51.  QAM for Upstream should be 256, and on 3.1 channel, QAM 4096.   There should be zero PostRS errors.   The number of T3's should be minimal.    

A BQM is also a must for any VM circuit, as it monitors and records the status for diagnostics and proof of poor performance, if needed.

That's a very quick simplistic overview.

I'm a Very Insightful Person, I'm here to share knowledge, I don't work for Virgin Media. Learn more

Have I helped? Click Mark as Helpful Answer or use Kudos to say thanks

We are looking mainly at the power levels on each channel, too low a power means that the modem can't properly 'hear' what data is being sent, counter-intuitively, too high a power can have a similar effect - think of it as someone whispering very quietly or someone yelling at full volume directly into your ear. In both cases it is hard to understand the message they are trying to convey.

The official DOCSIS (ie the underlaying cable internet technology) specifies a power range between +15 dBmV and -15 dBmV per channel, in case you are wondering how a power level can be negative, in this case in can be, just trust me on this one! In practice, though, you don't want to be anywhere near those extremes and each cable operator knows the reasonable operational limits of their own equipment. For VM a general 'rule of thumb' is a value between +10 and -6 dBmV, but because of the variable tolerances of electronic components, it doesn't necessarily follow that if the levels fall slightly out of this range then the connection will immediately fail, neither does it mean that a value of 0 is 'better' than a value of -3. But the closer it is to 0, the more leeway there is to cope with changes.

And things do change, as the weather gets warmer, the cables expand in the heat, their impedance goes up and so the power reaching the hub goes down. It is quite possible for there to be a 3dB swing between a cold winter's day and a very hot summer one. So a value of -4, say, on a bitterly cold day in January is fine but might give a cause for concern come August!

Signal to noise ratio S/N should be as high as possible, it is a measure of how 'clear' the signal is or how much extraneous 'noise' has gotten in. Anything above the low to mid 30dB should be OK but again there really isn't any hard and fast rule. I generally think that seeing values above 38dB is a good indicator.

Now the upstream is a bit more complex, the hub knows the power of the downstream channels it is receiving, what it doesn't know is how well the messages it is sending out are being received. All it can say is that it is broadcasting using x power levels and it is getting a response back so this is probably OK. Similarly, it has no idea of what S/N ratio the far end of the cable is getting so there is less useful information from the hub end.

Now there is a maximum value of power that the hub is capable of pushing out, if the values for output power look too high then it indicates that the hub is HAVING TO SHOUT to get its message through, which is often an indication of a cabling fault, bad connection, cracked co-ax cable etc. Acceptable values would be around 35 to 50 dBmV. Don't forget the change that happens as the cable expands in the summer - somewhat conversely, as the cable expands and the impedance rises, then the hub has to increase power to compensate, so you have the reverse situation from the downstream. Again, you really don't want to be looking at an upstream power of 50dBmV in the middle of winter, because it could well struggle in the summer.

Something else we often look at is the upstream channel modulation, not without going into the maths and drawing pretty constellation diagrams (actually Wikipedia has some nice animated diagrams showing how this works) QAM is a method of encoding binary data onto two analogue signals. The higher the QAM level, the more data can be 'packed' in but the better the connection has to be and the lower the noise level to be able to extract the data. VM's upstream runs at 64 QAM but it self-regulates, if it detects that a particular channel is a bit 'noisy' and can't reliably sustain this modulation, then it will drop to 32 or even 16 QAM because that is better than losing the channel entirely or corrupting data which would need to be resent.

This is not necessarily something to worry about, yes it is an indication that something isn't quite right but it might not be as important as it looks;

If you look at the upstream values you might see something similar to this

1  25800273   38.1    5120   64 qam

The first number is simply the channel ID, the second is the channel frequency (now the 25.8 MHz frequency is known to be a bit noisy, there are many sources of external noise which ideally shouldn't get into the cable but if there any imperfections in the connection that this is likely to be the channel most affected).

The third number is the outgoing power level, that's fine, the fourth mysterious number is the symbol rate in thousands of symbols per second. As I mentioned earlier QAM works by packing bits of digital data into 'symbols' and it is these that are actually transmitted to the hub. And lastly we have the modulation which tells us how many bits of useful information can be packed into each symbol.

So you might think that if the modulation falls from 64 to 32 then you have halved the data rate? Not so, because of the way QAM works it is a binary representation, 64 QAM is six bits per symbol (2 raised to the power of 6 is 64), and similarly 32 QAM is five bits per symbol, so you haven't halved the rate just reduced it by about 17%. If you want to know what that means in real terms well we can do (roughly) some maths.

A single channel at 64 QAM and a symbol rate of 5120 ks/s is - and not making allowance for overheads because I'm only interested in a comparison;

5120 x 6 = 30720 kbits per second or 31457280 bits per second or about 30 Mb/s (slightly more, I rounded the numbers and if we do allow for the DOCSIS overheads it's closer to 27 Mb/s anyway) - and each modem bonds four channels so a maximum theoretical upstream speed of c. 120 Mb/s

Now in reality the cable operator doesn't offer speeds that high because congestion and the way DOCSIS upstream works wouldn't let you get anywhere near the theoretical maximum reliably. VM's fastest upstream speed is 52 Mb/s, comfortably within the limit.

Now suppose one channel drops to 32 QAM, same symbol rate but it is now 5120 x 5 (rather than 6) or 25600 kb/s or 25 Mb/s - if the other three channels are still on 64 QAM or even if they have all dropped, the theoretical maximum upstream speed has dropped from 120 to 115 or 100 Mb/s (roughly), so it still shouldn't affect your actual connection rate in practice.

Should they all be at 64 QAM? Well yes. Is it an indication that the sky is falling in? Absolutely not!

Now lastly, when looking at the logs themselves, you can get a sort of measure of how important each message is by looking at the priority which on the Hub 3 and earlier mentions Critical, Warning, Notice and Error and on the later hubs these are replaced by numbers where 6 is the lowest importance and 1 is the highest. The categories however don’t always mean what you might think. Notice just means something has happened which you might be interested in, ie someone has logged in successfully or failed to log in because the password was wrong. ‘Error’ is far less worrying than it sounds - the hub has received some information which it doesn’t understand or has no idea what to do with and so has decided to just ignore it and carry on. The DHCP messages are like this and can be safely disregarded.

Now ‘Warning’, ie the RCS Partial Service messages, imply that something has happened but the hub can work around it, basically indefinitely, without noticeably impacting your service. Yes it’s not ideal and the hub will act to fix it at some point but don’t lose too much sleep over it.

Lastly ‘Critical’ now these are things that the hub can’t work around and will, if left, impact your service, now whether or not you notice this is another matter. The infamous No Ranging Response is typical of this, one of the upstream channels is not getting the response back that it expects, and if it doesn’t recover in a fairly short time, the hub will need to drop and renegotiate that connection.

Here endeth the lesson........

John


@jem101 wrote:

in case you are wondering how a power level can be negative, in this case in can be, just trust me on this one!


Just to name the reason: dB is a logarithmic scale. If you feed it with a number less than 1 (but higher than 0), the result is negative. If you feed it with a 1, the result is 0.

@jem101 Thanks for that explanation, very useful. 

The QAM on the log above and my logs show QAM 256? that seems good if I"m following?

While you're at it what about the BQM - might as well complete the lesson!

Assuming the spikes are really bad, as latency is really high.  If that's correct the lower the average (blue) the better and the smoother the better?

Screenshot 2022-03-16 at 10.53.04.png

 


@bryansmithsnr wrote:

@jem101 Thanks for that explanation, very useful. 

The QAM on the log above and my logs show QAM 256? that seems good if I"m following?

While you're at it what about the BQM - might as well complete the lesson!

Assuming the spikes are really bad, as latency is really high.  If that's correct the lower the average (blue) the better and the smoother the better?

Screenshot 2022-03-16 at 10.53.04.png

 


The downstream channels run at a much higher frequency than the upstream ones, as a general rule, the higher the frequency the more data you can 'pack' into it which translates to a faster speed if you like.

The downstream runs at 256 QAM which, as a test for the reader, is how many bits per symbol?

I don't want to be greedy so I'll let someone else do a write up of how the BQM works, how to interpret it and what is does and doesn't say about your connection 😃

But yes, the intermittent spikes are not uncommon, it's a sign of very short term congestion - in general, the lower the green, yellow and blue sections are and the flatter they are, the better.

John