update

Another user has confirmed the CFUN workaround and also suspects a boot timing issue. He also brings new-to-me information about boot/connectivity timing on different versions of the router’s firmware.


Cf. this thread on the GL.iNet forum.

TL;DR

Resetting the modem appears to restore SMS when IMS falls over. I am testing this watchdog script1 called at boot time2 from /etc/rc.local, thusly:

### check and fix SMS
# first check 1 hour after boot
# then regularly as configured in script
                                                     
/usr/bin/nohup sh /root/bin/visibleSMSwatchdog.sh 3600 &

It is intended to toggle the modem only when appropriate. If there is a less ham-handed way to get IMS working that would, of course, be preferable.

sample run

18:19:19 up 0 min, load average: 2.01, 0.61, 0.21 waiting 3600 seconds for first run
20:09:20 up  1:50,  load average: 0.15, 0.08, 0.01 IMS failed
20:09:20 up 1:50, load average: 0.15, 0.08, 0.01 RESETTING MODEM

OK 

OK 

Sun Nov 30 20:19:31 MST 2025 checking IMS...
Sun Nov 30 20:29:31 MST 2025 checking IMS...
...
Mon Dec 1 07:39:37 MST 2025 checking IMS...
Mon Dec 1 07:49:37 MST 2025 checking IMS...

Boots at ~1819, IMS (and therefore SMS) failure detected at 20093, CFUN on/off sent which generates the OK output, and IMS (and therefore SMS) are stable thereafter. The log sample ends 11 hours after IMS was repaired with no further failure. If there were another failure the watchdog would catch it and correct again.

invocation from rc.local

I am running the script from rc.local as seen above. Running it nohup(from the coreutils-nohup package) allows it to fork off rc.local and live on without it.

We could run without nohup: sh /root/bin/fixVisibleSMS.sh 3600 & but the script would die when rc.local exits. We could run it in the foreground: sh /root/bin/fixVisibleSMS.sh 3600 but it would hang rc.local; that might or might not be problematic. I don’t know if anything else on the OpenWRT or GL.iNet boot process is watching for it to complete.

We could also call the script without sh but I don’t want to walk first-timers through linux permissions, shebang lines, etc. Explicitness is typically a virtue in impoverished environments4 we see in boot scripts, crontabs, etc.

TO DO

  • The first check is delayed because it seems that running too early can cause the very long reconnect times. [update: it can also cause the fix to fail again with the same 2 hour time frame.] With testing I hope to eventually figure out what’s causing the problem so we can run the reset ASAP after booting without hitting side effects.
  • it is possible that a less intrusive command could be used to get IMS back up. My understanding is IMS is related to sending SMS over VOLTE so that may be a path of exploration.

sausage making and notes - ignore

DRAFT

the sections below are me thinking out loud; feel free to skip.

symptoms

Folks using Visible on the GL-iNet X3000 Spitz5 can send and receive SMS messages from the router’s web UI after a fresh reboot. But after ~2 hours the SMS fail until the next reboot. The 2 hour delay made troubleshooting slow because you couldn’t tell if something had worked or not until the 2 hours were up.

testing

GL goes above and beyond IMO to help customers fix problems, including logging into the router to test and troubleshoot. One attempt to identify the problem, a script called auto_check_SMS_Visible.sh was intended to generate logs and store modem-level data for analysis.

connectivity crashes

Running the script would knock my connection offline and it wouldn’t come back.6 I’d reboot to get a working connection again. I reported the problem to GL, who logged in and ran it successfully themselves. WTF? Originally I thought this had to do with the invocation of the script somehow (./auto_check_SMS_Visible.sh vs sh auto_check_SMS_Visible.sh). Both were calling sh (busybox’s ash) so I was confused.

When GL ran the log-collection script SMS stayed stable for 7 days; this was a remarkable change.

But when I rebooted for a fresh start and ran the script I got connection problems again. Eventually I ran the script at a later time and suddenly it worked; the connection dropped for ~60 seconds then came back up with stable SMS.

WTH is going on here?

GL’s logging script was watching for IMS to fall over. But before it starts logging the test script toggles modem functionality:

# off
gl_modem -B $BUS AT AT+CFUN=0 
sleep 1
# back on
gl_modem -B $BUS AT AT+CFUN=1

This is a Heisenbug – running the logging script changes the test results.

Additional weirdness: running the commands “prematurely”7 causes very long reconnects. On my router I timed a premature reconnection and it took 18 minutes to reconnect. Theory: GL could only log in well after the reboot and therefore could not run the script while the connection was “stabilizing”.

possible workarounds

  • run the AT commands manually as needed (possibly only once at reboot + 2 hours)
  • run a watchdog that issues the AT commands programatically as needed (possibly only once at reboot + 2 hours)

If doing it programatically we can watch for IMS failure before changing anything:

# check IMS status
	# WORKING:      +QIMSCFG: "ims_status",0,0,2,2,0,0
	# NOT working:  +QIMSCFG: "ims_status",0,0,0,0,0,0 

	echo $(date) checking IMS... | tee -a $LOG
	IMSSTATUS=$(gl_modem -B 0001:01:00.0 SAT sp AT+QIMSCFG=\"ims_status\" \
		| awk -F , '{print $5}')

Note: the LOGging uses uptime instead of date for timestamps because time-since-reboot can be important info here.

old updates

I tried waiting for net connection after reboot then resetting the modem. This fixed the issue of slow reconnection but IMS/SMS failed at 2 hours again. Resetting the modem with CFUN when IMS/SMS failure is detected is clunky but works 100% of the time so far.

In an attempt to find a more surgical approach I tried twiddling PDP context 1 (IMS on Verizon-based carriers) with AT+CGACT. It errored when turning it off so that’s a no go. Some docs say the syntax is state,cid and some say cid,state. On the GL router it appears to be the former.

  1. renamed with a .txt extension to keep from confusing the webserver 

  2. it can also be run on demand. 

  3. 1 hour, 50 mins after boot. This is in line with the widely-reported ~2 hour failure sequence. 

  4. the ENV in those scenarios has fewer vars defined since it’s not calling .profile, .login, etc. 

  5. and possibly others; dunno. 

  6. I later timed it and it would reestablish the connection after ~18 minutes. 

  7. I say premature because it seems to happen most often after reboots. Will have to experiment more. 

Updated: