PSOD due to abrupt interruption in the net-queue commit process

After upgrading ESXi host to 6.5 U1, hosts may experience the PSOD.

The PSOD stack looks like : 
2017-09-16T15:34:30.908Z cpu6:65645)@BlueScreen: #PF Exception 14 in world 65645:HELPER_UPLIN IP 0x41802c496258 addr 0x0PTEs:0x292379a027;0x2efe54c027;0xbfffffffff001;2017-09-16T15:34:30.908Z cpu6:65645)Code start: 0x41802c200000 VMK uptime: 4:02:26:10.1512017-09-16T15:34:30.908Z cpu6:65645)0x4390c369bd00:[0x41802c496258]UplinkTreePackQueueFilters@vmkernel#nover+0x188 stack: 0xe154270002017-09-16T15:34:30.909Z cpu6:65645)0x4390c369bd90:[0x41802c49e142]UplinkLB_LoadBalanceCB@vmkernel#nover+0x1e42 stack: 0x12017-09-16T15:34:30.909Z cpu6:65645)0x4390c369bf20:[0x41802c4916f2]UplinkAsyncProcessCallsHelperCB@vmkernel#nover+0x116 stack: 0x43048761eac02017-09-16T15:34:30.910Z cpu6:65645)0x4390c369bf50:[0x41802c2c9e0d]helpFunc@vmkernel#nover+0x3c5 stack: 0x4300b9b2a0502017-09-16T15:34:30.910Z cpu6:65645)0x4390c369bfe0:[0x41802c4c91b5]CpuSched_StartWorld@vmkernel#nover+0x99 stack: 0x02017-09-16T15:34:30.913Z cpu6:65645)base fs=0x0 gs=0x418041800000 Kgs=0x0

Contributing factors: 

  • The host is upgraded to 6.5 U1
  • It has 10G or more capacity NIC cards, such as elxnet FlexFabric 20Gb or FlexFabric 10Gb. But it is not restricted to only Emulex.

Cause : 

Netqueue commit phase abruptly stops due to a failure of hardware activation of an Rx queue.

Workaround: 

The workaround for this issue is to downgrade the ESXi host to 6.0 U2. As 6.0 U2 has the fix for this issue.

Resolution: 

This issue is resolved in VMware ESXi 6.5 P02 (ESXi-6.5.0-20171204001-standard)

Reference:  

https://kb.vmware.com/kb/2151749

Leave a comment