BoSL Board Debug
Stop logging issue
The BoSL Board stops logging from time to time is a big pain. The team decided to find the root cause and solve the problem.
Trouble shooting history
31st May 2023
20 BoSL boards were setup for trouble shooting. 10 run the original BoSL_logging.ino code, each one with a tag (OLD_1, OLD_2, ...), let's call them OLD group. Another 10 run Dave's new version code BoSL_VEL_Dave_290523.ino, each one with a tag (NEW_1, NEW_2, ...), let's call them NEW group. The major difference between OLD and NEW group is the NEW group's code delays longer to pull Sim7000 powerkey down to power off Sim7000. Because we suspect that some MCUs run slowly and the delay(1200) is less than accurate 1200ms, so sometimes Sim7000 may not really power off and the result is the Sim7000 runs out of order.
8th June 2023
Observed the 2 groups for one week. The result is both groups had some ones stop logging, 2 in NEW group and 2 in OLD group.
The conclusion is: delay(1200) is not the cause of stop logging.
A logbook for the trouble shooting was created on the google drive, find it here.
15th June 2023
Lucky to catch 2 boards stop logging, NEW_1 and OLD_2. Luke and Felix did the test together. VPP Voltage, Sim7000 STATUS pin voltage, Sim7000 POWERKEY voltage were tested. We also used an Arduino UNO pass though board tested BoSL board TX output, Sim7000 TX and RX output.
NEW_1 was a Sim7000 issue as ATMega pulls down and up POWERKEY as expected and gives good ATCommands to Sim7000, but Sim7000 STATUS pin voltage always near 0V(5.7-5.8mV) and gives no reply to ATCommands.
OLD_2 turned out to be ATMega issue. The POWERKEY voltage was always near 0V in the test and we read nothing from Sim7000 RX pin, means no ATCommands were sent to Sim7000 by ATMega.
During the test with OLD_2, we noticed another problem with the board, OLD_2's VPP voltage dropped to 1.8V.
17th June 2023
To increase the chances of getting stop logging boards to test. OLD_X, OLD_11, OLD_12, OLD_13, OLD_14, NEW_0, NEW_11, NEW_12, NEW_13 and NEW_14 were set up to join both groups.
20st June 2023
3 boards stopped logging, OLD_X on 19th, NEW_1 and NEW_13 on 20th. OLD_X was an ATMega issue, similar as previous OLD_2, and low VPP voltage happened again to OLD_X. NEW_1 and NEW_13 were Sim7000 issue, they gave no response to ATCommands and STATUS pin voltage always 0V.
Re-plugging battery to get OLD_X start to logging again and Reset Sim7000(pull down reset pin) to get NEW_1 and NEW_13 working again.
21st June 2023
NEW_baudrate was set up for the purpose of testing Sim7000's stability. The serial baud rate between ATMega and Sim7000 was set to 38400, based on NEW group code.
Same baud rate change also made to NEW_14, NEW_13, OLD_14 and OLD_13.
28th July 2023
For the past several weeks, many boards were detected stop logging. They were tested and the test results updated to the logbook. Here is the summary of the test results:
Boards | ATMEGA Issue | SIM7000 Issue | 1.8V issue | Self-recovery | Normal Power Running LOW | Unknown | Grand Total |
---|---|---|---|---|---|---|---|
NEW | 1 | 8 | 1 | 5 | 1 | 16 | |
OLD | 4 | 15 | 2 | 5 | 1 | 27 | |
Others | 3 | 3 |
10 August 2023
2 boards, NEW_9 and BoSL_logging_delay_480s were detected low voltage(1.8V or lower) issue during the test.
The Vpp and 3.3V pins voltage was normal at the beginning of the test. After connected BoSL TX to a computer via a passthrough Uno, their voltage dropped to 1.3V ~ 1.4V.
For BoSL_logging_delay_480s, after it's Vpp voltage dropped to 1.4V, we removed it's connection to Uno and tested the Vpp and 3.3V pins voltage again, it's still 1.4V. So we can confirm that the low voltage issue is caused by a circuit short on the board.
Note:
OLD_11 to OLD_14, NEW_11 to NEW_14 are no longer running. These boards are used to test resetting SIM7000.
Currently we still have these BoSL boards running for test:
OLD group:
OLD_1 to OLD_10, OLD_X
NEW_group:
NEW_0 to NEW_10
Baudrate and delay group:
BoSL_logging_delay_480s, BoSL_logging_delay_600s and BoSLog_delay_aft_openbearer
SIM7000 RESET group:
RESET_1 to RESET_5
23 August 2023
This is the latest summary of the test result. For "Others", NEW_baudrate is put into NEW group, and delay_480s, delay_600s and openbearer are put into OLD group according to their base code. Because the test results show delay more for loop or change baud rate won't significantly affect on boards stability. Another thing to mention is 2 ATMEGA issues in the last summary are actually 1.8V issue, I categorized them to 1.8V issue in this summary(We cannot expect ATMega to work in such low voltage).
Boards | ATMEGA Issue | SIM7000 Issue | 1.8V issue | Self-recovery | Unknown | Grand Total |
---|---|---|---|---|---|---|
NEW | 2 | 12 | 2 | 2 | 1 | 19 |
OLD | 2 | 24 | 3 | 3 | 1 | 33 |
The tests results continually support the conclusion that SIM7000 is the main issue and ATMEGA issues are rare compare to SIM7000 issues. For the 1.8V issue we only detected it during the test, there is no evidence this can happen in the production environment till now.
28 August 2023
Monitored interesting behaviors on OLD_10(stop logging), when ATMEGA is running(in loop function), the 3.3V pin voltage is normal 3.3V, but when ATMEGA goes to deep sleep, 3.3V pin measured 4.3V and keeps until the next loop starts. Another abnormal voltage is on POWERKEY, when ATMEGA is running, the voltage rises to 1.9V (normally 1.3V).
06 September 2023
BoSL_logging_delay_480s stopped logging. Test result shows SIM7000 is responsive, but after sending http request, it gives 603 error. This 603 error happened several times before, we categorized it to SIM7000 issue. 603 error code means DNS error, it can be a software issue or server issue, need to investigate into it. A new category "DNS error" was added to the google sheet.
07 September 2023
Test summary from 14 June to 07 Sep 2023:
Type | ATmega issue | SIM7000 issue | DNS error | Stack busy | register issue | 1.8V issue | Unknown | Total number of issues experienced |
---|---|---|---|---|---|---|---|---|
NEW | 2 | 15 | 2 | 3 | 22 | |||
OLD | 2 | 23 | 4 | 3 | 1 | 3 | 5 | 41 |
It is interesting that all 4 DNS errors happened on BoSL_logging_delay_480s board.
* NEW: NEW_1 to NEW_10 and NEW_baudrate (11 boards in total). * OLD: OLD_1 to OLD_10, OLD_X, BoSL_logging_delay_480s, BoSL_logging_delay_600s, BoSLog_delay_aft_openbearer and openbearer (15 boards in total). * ATmega issue: No output from BoSL TX, which indicates the MCU is no longer functioning. * SIM7000 issue: SIM7000 gives no response to ATCommands, but we can receive output from BoSL TX; thereby suggesting the Sim7000 is not functioning correctly. * DNS error: SIM7000 gives 603 error code. * Stack busy: SIM7000 gives 604 error code. * register issue: SIM7000 replies "+SAPBR: 1,3,"0.0.0.0", which means no IP has been assigned. * 1.8V issue: Vpp voltage <= 1.8V, battery voltage is above 3V when unplugged. * Unknown: BoSL starts to log during the test, unable to complete the test.
10 BoSL boards had been logging without any issue from full-charged battery to flat battery:
-- From 31 May 2023 to 07 September 2023
SIM RESET
From the test results from NEW and OLD group, we learned that not logging is mainly caused by SIM7000 issue. The main reason of SIM7000 issue is for some reason it cannot be powered on by pulling down POWERKEY pin. There is a NRESET pin on SIM7000, and our tests show pulling down NRESET pin can wake up SIM7000 when pulling down POWERKEY fails.
The fact is, to save ATMEGA IOs, the NRESET pin of SIM7000 is floating on our BoSL board, we have no control to this pin from ATMEGA.
We decided to solder a wire to NRESET pin and connect it to ATMEGA. Which pin on ATMEGA is the best choice to control NRESET? We have 2 options: A3 and SD_CS.
23 August 2023
6 BoSL boards were setup for testing for 4 weeks till now. They are:
RESET_1 - connect to reset pin via diode.
RESET_2 - connect to reset pin via transistor.
RESET_3 - connect to reset pin via one wire.
RESET_4 - connect to reset pin via transistor.
RESET_5 - connect to reset pin via transistor.
RESET_7 - connect to reset pin via one wire.
Their logbook is here.
From the test result, we haven't see one case of SIM7000 cannot be powered on or gives no response to AT Commands. Reset SIM7000 cannot solve ATMega issue, so in the test there is still one case of ATmega issue.
ATMEGA Issue | SIM7000 no response | http request issue | SIM card issue | Unknown | Grand Total |
---|---|---|---|---|---|
1 | 2 | 2 | 2 | 7 |
* ATMEGA Issue: No output from BoSL TX nor voltage change on POWERKEY pin. * SIM7000 no response: SIM7000 gives no reply to ATCommands. * http request issue: Usually start from SIM7000 replying an "+CPIN: NOT READY" error, and then the uploading not succeed. * SIM card issue: Uploading succeed after replaced sim card.
I guess the http request failure could be soldering issue, need to try re-soldering it with a sharper solder pen.
25 August 2023
Did the following test on RESET_5:
- Tested RESET_5 with a big antenna(30 cm), http request failed.
- Tested RESET_5 with a big antenna and OLD code, http request failed.
- Tested the SIM card used by TEST_5 on another board, logging well.
- Removed the the soldering wire on the NRESET and tested with OLD code, http request failed.
Conclusion: RESET_5 BoSL board broken.
14 September 2023
2 RESET boards stopped logging: RESET_1 on 12 September and RESET_3 on 13. They had good behavior, RESET_1 was still logging when voltage was 2.7V, and RESET_3 2.8V.
RESET_1 battery voltage was still 2.7V on the 14th, while RESET_3 battery voltage dropped to 0V. The difference is that RESET_1 connects the RESET pin via a diode and RESET_3 via a wire.