cancel
Showing results for 
Search instead for 
Did you mean: 

Hub 4 modem mode: regular issues with SSL traffic

heliosfa
On our wavelength

Hello,

For the past decade or so I have been running a Virgin connection with the hub in modem mode connected to a pfsense firewall. For the most part, my setup has was working fine on the 200 Mb/s service with a Hub 2ac for a fair few years. I upgraded on the 18th of January to the 500 Mb/s service with a Hub 3 and this was also working fine. The problems started on the 26th of January when I was upgraded to Gig1 with a Hub 4 as a Volt benefit.

Since then, I am having a regular issues with SSL traffic to *some* destinations breaking when the Hub 4 is in modem mode. In Firefox, this manifests as the browser hanging on "Performing TLS Handshake" until it times out. In Chrome it hangs on "Establishing Secure Connection".

Some of the problem sites include:

When the problem occurs, I also have issues tunneling traffic (e.g. remote desktop) over SSH. Other websites (e.g. https://old.reddit.com and https://www.speedtest.net ) and anything routed over my Hurricane Electric IPv6 tunnel continue to work absolutely fine. Online game traffic (CS:Go, Tarkov, etc.) also seems to be unaffected. Speed tests continue to report the full expected speed.

Pings to the problem sites continue to work. Even the redirect from news.bbc.co.uk to www.bbc.co.uk/news works, the browser just hangs after the redirect.

Sometimes the problem is transient and resolves itself after a few minutes. Other times it takes a reboot of the Hub 4 to resolve. Rebooting the firewall does not resolve the issue and the issue occurs with other devices (including an alternative router and a straight Windows desktop) in place of the firewall.

Packet captures show that the traffic is correctly leaving the firewall's WAN interface.

Also, the admin interface for the Hub 4 at 192.168.100.1 stops responding within an hour or two of rebooting the hub. Ping to the hub also become sporadic with only one in every ten or 20 pings eliciting a response. It takes a reboot of the Hub 4 to regain access to the admin interface.

 

The issues (including the admin interface one) do not occur if I introduce a double-NAT scenario with the Hub 4 in "normal" mode with the firewall in the DMZ. Obviously this is not an ideal long-term solution.

 

Has anyone else come across this? is it a "feature" of the Hub 4 or have I just got a bad one?

I am dreading trying to explain this to support over the phone as it is an intermittent issue and involves modem mode.

 

 

 

87 REPLIES 87

As is typical with these intermittent issues, when you want to run some more diagnostics, the issue refuses to manifest... that is, until I was actually in bed.

I was only able to run some limited diagnostics (I was on an iPad...) but I did find something interesting.

When everything is working, I can do full-size 1472-byte (1500 with headers) pings with the DF bit set to everything. When the issue was occurring, full-size pings to sites that are unaffected carried on with no issue but to the sites that break, I could now only get a response for a 1454-byte (1482 with headers) ping. No ICMP Fragmentation Needed responses were seen as far as I could see.

 

> You don't see any resets? Or re-transmissions?

Honestly, I can't remember. I have been trying to debug this since the service went live and have run so many packet captures at different times. I was hoping to catch it again during an outage, but typically it doesn't happen when you want an outage...

> Are you able to run iperf3? It will show the number of packets re-transmitted in the output.

When everything is working, there are obviously no retransmissions that it is telling me about.

 

> Do you have a BQM setup?

I have just set one up Helios' BQM 

 

I have not given legacy1's suggestion a shot (though this issue manifests with a different router and a laptop connected instead of the pfsense box) as it hasn't failed at a time I can look into it since I posted.


@BaldrickBravo wrote:

You don't see any resets? Or re-transmissions?

So, I have had to wait until a little after 2am for the issue to manifest again and I grabbed some packet captures on the WAN interface of the firewall. Digging into them, there are re-transmissions and Wireshark is flagging several instances of "TCP Previous segment not captured" on inbound packets from problem sites. This is then followed by a couple of Dup Acks. This repeats until the web browser gives up.

 

Checking the path MTU (with mturoute and ping) when the issue is ongoing, I get:

When the issue is NOT ongoing, I get 1500 to all of the above.

This is slightly different to last night where I could still get 1500 to Reddit during the issue.

 

Setting the MTU on pfsense's WAN interface did not fix the problem.

 

I was about to clone the pfsemse box's MAC address onto a laptop to swap the device without having to reboot anything to see if I could glean anything further. Unfortunately the issue resolved itself just as I was sorting out the laptop.

 

Any ideas?

legacy1
Alessandro Volta

Likely deep inspection router somewhere trying to see where your going (yup SSL/TLS shows in the clear where your going) causing problems.

---------------------------------------------------------------

heliosfa
On our wavelength

@legacy1 wrote:

Likely deep inspection router somewhere trying to see where your going (yup SSL/TLS shows in the clear where your going) causing problems.


That would not explain the MTU oddities that seem to be occurring between my firewall's WAN interface and the CMTS' port...

Maybe capture some tracert's to each of those destinations to see if there are differences between the working/not working routes?  I'm wondering if there is something en-route that might be causing the issue.

Deano

I do have to say that the simplest and most likely explanation is that there is an intermittent bug in the hub's firmware which only manifests in modem mode. Now I can't recall hearing of any other posts reporting similar issues so it might well be that this is something which is unique to your hub, or at least limited to a small number.

The only way to test is to get VM to replace it, but that'll probably be far easier said than done.

legacy1
Alessandro Volta
Is jumbo packet disabled?
---------------------------------------------------------------

My money is on this not being a hub problem at all in any way shape or form.

Peering is dynamic, it changes from time to time. I bet it is coming and going as peering arrangements change. In one of the peering arrangements used to reach some of the sites, there is a router with a lower MTU.

There is a thing call PathMTU discovery: https://en.wikipedia.org/wiki/Path_MTU_Discovery. It is reliant on a router receiving certain ICMP packets (code 3, type 4) to work. I suspect the hub in router mode is accepting this packets and dynamically reducing the MTU. I suspect with pfSense, these packets are getting dropped.

This is all conjecture, of course, but the theory fits. If it's not that, then it is going to be something along those lines, related to the routers packets are traversing rather than something your hub is or is not doing.

Are you per chance blocking some inbound ICMP packets in your pfSense WAN firewall rules?


@BaldrickBravo wrote:

My money is on this not being a hub problem at all in any way shape or form.

Peering is dynamic, it changes from time to time. I bet it is coming and going as peering arrangements change. In one of the peering arrangements used to reach some of the sites, there is a router with a lower MTU.

There is a thing call PathMTU discovery: https://en.wikipedia.org/wiki/Path_MTU_Discovery

Are you per chance blocking some inbound ICMP packets in your pfSense WAN firewall rules?


I'm not blocking anything specifically and related-established should pick up anything relevant. Also, the packet capture on the WAN interface captures everything pre-firewall and I am not seeing anything relevant.

Your peering suggestion would make sense if the MTU was not being constrained to the CMTS and city core - there is no peering at this level. It is quire clear that the problem is occurring between the WAN interface of my pfsense box and the first routing hop.

heliosfa
On our wavelength

@legacy1 wrote:
Is jumbo packet disabled?

I do not have jumbo packets enabled and the symptoms are showing that the MTU appears to be changing without warning.

 

@DinoParry wrote:

Maybe capture some tracert's to each of those destinations to see if there are differences between the working/not working routes?  I'm wondering if there is something en-route that might be causing the issue.

Deano


I did a few traceroutes and then checked what size packets I could get to each of the common hops. This is when I found out that the MTU seemed to be dropping between my WAN interface and the first routing hop (CMTS)