Forum Discussion

Andrew-G's avatar
Andrew-G
Alessandro Volta
4 years ago

Hub status data - understanding network log messages

N Warning: Long, dull, a bit technical. N

Having needed to repeat myself a few times, I thought I'd share my understanding of what you might find in a Hub 3's network log when things go wrong, and what it means.  First of all, this post is based on my own reading around DOCSIS cable technology, all is public domain, and is not information supplied or endorsed by Virgin Media.  Whilst I have some experience working on computer systems, I am only an interested amateur here, I have no technical training in this area, there are considerable simplifications I have chosen to make, and the possibility of error where I may simply be wrong.  If you do know better, feel free to correct me, but then you could have written all of this first.  Most of what follows apples to the Hub 4 and earlier Superhubs, but may have slightly different terminology.

In understanding what follows, you need to know that your hub communicates by converting all digital communication up or down to/from an analogue radio frequency (RF) signal which it communicates via 24 downstream channels and 4 upstream channels through the coax cable to a Cable Modem Termination System (CMTS) that sits at the top of the local coax network.  The CMTS converts all the mixed RF analogue signals from the hundreds of customer hubs connected to it back to digital, and pipes this back to VM's internet exit points.  In simple terms, your hub is a modem, and the CMTS is a stuffing great multi-user modem, and the hub uses its upstream connection to keep all 28 channels synchronised with the CMTS, at the right frequency and power levels.  When your hub starts up, it goes through a process of registration that starts off by looking for a downstream sync signal, then establishing a primary upstream channel, and working out from there which frequencies it can use to connect all 28 required channels, in amongst the signals from hundreds of other hubs on the same cable segment, and several hundred TV channels, and it also needs to establish internet protocol communication with the CMTS and secure an IP address.

The hub’s network log is monitoring the broadband connection between the hub’s built in cable modem and the CMTS, it has nothing to do with wifi, which is the wireless network in your house between the hub and your own devices.  If you're seeing serious and continuing network log errors, then fiddling with wifi settings, or even buying your own router will not help, and complaining to VM about "wifi not working" will result in people assuming that you do mean the wireless network in your house.

On a Hub 3's network log, there will always be a range of error or info messages, often with technical and unhelpful descriptions.  The vast majority of DHCP or LAN Login messages and errors can usually be ignored.  Some others are self-explanatory, such as messages loosely along the lines "rebooted at user software command", or "downloaded firmware update".   

Due to the analogue technology that is used by cable, all cable modem connections tend to have some radio frequency noise, and will accumulate errors every so often, and infrequent errors can usually be ignored.  So the first thing in judging if there’s a problem is repeated errors with some degree of frequency.  In addition to how often these messages turn up, the severity of a problem can sometimes be indicated by conjunction between different error messages, plus evidence from other indicators including power levels, modulation levels, SNR data and upstream timeout counts, as well as external fault checkers like a Thinkbroadband BQM.  It is also important to observe the time stamps on error messages - a cluster of perhaps 5-8 serious errors within a ten minute period is probably down to a single network event, the same errors occurring spread across the day indicates a continuing or repeated problem.  Note as well these can be "cascade errors", where the hub initially has a timeout on ranging (see below), can't recover that, then sees one or more channels lost, and eventually ends up needing a full re-registration, creating a string of logged errors that in practice refer to a single event.  A few errors when starting up the hub are usually no cause for concern, that's hopefully just the hub rubbing the sleep out of its eyes, and likewise errors caused by electrical disturbance such as mains surges or brownouts will often create errors that are not a cable connection fault.  

In terms of the error messages you might see, I'll pick out the ones that are moderately common and serious, there are others which are not material, or not featuring here. 

SYNC Timing Synchronization failure: Broadly speaking the most serious connection failure, where the hub cannot detect the synchronisation data that is transmitted by the CMTS every 20 milliseconds.  As a result the hub loses synchronisation with the CMTS, usually all internet connectivity is lost, and the hub has to renegotiate all channels from scratch, taking about 7 minutes to do this.  This usually takes as long as, and to a user looks like the modem has spontaneously rebooted.

RCS Partial Service: The next most serious common error.  This means that the hub is still getting sync signals from the CMTS but has lost communication on one or more of its 28 channels, although the remainder are reporting OK.  The hub needs to renegotiate those channels that have dropped.  The hub will look as though all is OK, casual internet browsing may be largely unaffected, but any continuous connection internet activity will be dropped out, causing speed or latency problems.  Usually the hub can renegotiate the dropped channels, but if it cannot it may revert to a full re-registration with the CMTS, in which case it will again look like a reboot and take 7 minutes.

No Ranging Response received - T3 time-out: (and related but less common T2 and T4 timeouts).  T3 timeouts are more common than RCS Partial Service, often similar in effect, but be aware most cable connections will see the odd T3 timeout once every few days, and if infrequent they are no cause for concern.  A T3 error means the hub has not had a response from the CMTS to its upstream ranging requests that keep everything working at the right frequency and power, often due to upstream RF noise.  Of itself a dropped ranging response doesn’t always mean a loss of connection, but if power or frequencies drift then there may be a momentary drop out. Normal response of the hub to “no ranging response” is often an attempt to regain communication by increasing power levels, but if that doesn’t work, then the hub may drop all channels and try to re-register with the CMTS, creating the symptoms seen with RCS Partial Service.  As a generalisation T4 timeouts alone are often indicative of downstream problems (although if it is a one off it wouldn't be material), but it’s fairly rare to see multiple T4s.  T3 timeouts usually indicate upstream issues, and may be associated with channel loss, dropped modulation or power problems on the 4 upstream channels.

Lost MDD Timeout: Refers to loss of system messaging between hub and CMTS, despite how serious that sounds, MDD timeouts are usually of no significance to the user, but sometimes crops up with the other more serious errors.

Incorrect time stamp: It is not unusual to see 1970 dates (or ToD error reports) in the log of a hub that’s struggling.  Of themselves an incorrect date doesn’t cause the user problems, but what it shows that during registration with the CMTS, the hub isn’t getting the correct network time and date, which is usually another indicator of noise problems.

And finally, don’t waste your life monitoring your network log.  If you have problems with speed, reliability or latency, that’s when you start looking at the network log. 

17 Replies

  • katspike's avatar
    katspike
    On our wavelength

    Thank you for posting this. Why isn't this info more prominent? So many people are posting related problems on the forums - often with no apparent remedy,

    It's taken me ages to stumble upon this invaluable info. This should be pinned and published in the help section of the website!

     

    • Adduxi's avatar
      Adduxi
      Very Insightful Person

      Every section of the Help forums have "red stickies" at the top.   Despite this I guess 99% of people never read them before posting.  It's the same constant churn of requests for help, of which many could be sorted by a bit of reading these stickies.

      However, on the upside, it gives some of us something to so by way of helping out  🙂

      • bryansmithsnr's avatar
        bryansmithsnr
        On our wavelength

        @Andrew G - Thanks for that breakdown, very helpful. Got a better understanding, 

        Question - When "you guys" are helping people you ask for the stats on power/db  etc (as below) - what are you looking for here?  What's good and bad look like?

         

        .0 Downstream channels

        Channel Frequency (Hz) Power (dBmV) SNR (dB) Modulation Channel ID

        253307500001.539QAM25625
        61787500002.339QAM2566
        71867500002.540.4QAM2567
        81947500002.440.4QAM2568
        92027500002.340.4QAM2569
        102107500002.140.4QAM25610
        112187500001.940.9QAM25611
        122267500001.839QAM25612
        13234750000239QAM25613
        142427500002.140.4QAM25614
        152507500001.939QAM25615
        162587500001.339QAM25616
        172667500001.339QAM25617
        182747500000.938.6QAM25618
        192827500000.538.6QAM25619
        202907500001.239QAM25620
        212987500001.839QAM25621
        223067500002.339QAM25622
        233147500002.139QAM25623
        243227500001.638.6QAM25624
        263387500001.739QAM25626
        273467500001.839QAM25627
        283547500001.839QAM25628
        293627500001.838.6QAM25629
        30370750000239QAM25630
        31378750000239QAM25631
        323867500002.139QAM25632
        333947500001.839QAM25633
        344027500001.639QAM25634
        354107500001.739QAM25635
        364187500001.739QAM25636
  • This is brilliant - I've often searched for this information, but have struggled to find an explanation which I can follow - thanks so much.  

    Is anyone able to confirm what the Network Log message of 'NOTICE ATOM is restarted due to Kernel/oops panic as part of Self Healing Mechanism' please? 

    I used to have this message a lot and multiple (~25 a day!) drop outs; this stopped when Virgin gave me a new hub (October '21).  I've started getting the message again (about once a day), and again it drops out at the time of this message, so worried I'm going to start having significant issues again! 

    • Adduxi's avatar
      Adduxi
      Very Insightful Person

      Wilhelminajmd wrote:

      <snip>  Is anyone able to confirm what the Network Log message of 'NOTICE ATOM is restarted due to Kernel/oops panic as part of Self Healing Mechanism' please? 

       


      The CPU, in this case an Intel ATOM, has restarted the kernal, the kernal being the CPU running processes as such.  This in effect, restarts the CPU processes without having to power cycle, i.e. reboot the whole Hub.  It is faster and less noticeable to the end user.  

      Generally happens when the CPU is overloaded and the processes overwhelm the OS. 

  • Found this on my first network search very useful context to help understand my issue


  • Andrew-G wrote:

     

    Due to the analogue technology that is used by cable, all cable modem connections tend to have some radio frequency noise, and will accumulate errors every so often, and infrequent errors can usually be ignored.  

     

     Andrew thank you for posting this great knowledge article, it's really helped me to drill into an issue my household has been experiencing.

    I have a small quibble with the term "analogue" technology. It's not an inherently analogue issue, whether the data is encoded digitally or in analogue style it's an issue with the nature of the physical media that is used for transmission and reception of data. 

    Also a point to point passive fibre connection IIRC has 1x10^14  less errors but it too will have occasional errors when you add in repeaters, multiplexers, and concentrators.

  • when i see 

    No Ranging Response received - T3 time-out
    thats when my internet drops out