EIR-OPS-005: Failsafe at Initial AOS

Objective

To confirm 2-way communication with EIRSAT-1 while in failsafe. To also assess the reason for, and subsequently leave, failsafe.

Introduction

The Operator should have been directed to this procedure from EIR-OPS-004: Initial AOS as the current boot image of the spacecraft at initial AOS was determined to be failsafe.

Using this procedure, the Operator will verify that the antennas are fully deployed prior to making the decision to finish the Separation Sequence. This part of the procedure is very similar to EIR-OPS-004: Initial AOS but has been tailored to suit the current boot image. The Operator will also be advised on how to assess what chain of events potentially led to failsafe since launch, and whether a primary image can/should be safely booted.

Note

Failsafe is not equipped with a Mode Manager. Therefore, as part of this procedure, rather than transitioning from Separation Sequence Mode to Commissioning Mode, the Operator will just ‘finish the Separation Sequence’, putting the Separation Sequence state machine into its finished state directly rather than via a Mode Manager.

Procedure

This procedure follows on from Section B of EIR-OPS-004: Initial AOS and contains the following sub-procedures:

A. Data Downlink
B. Data Analysis (After the Communication Pass)
C. Finishing the Separation Sequence
D. Booting Primary

Important

Communication with the spacecraft is required for Sections A, C and D of this procedure.

A. Data Downlink

A.1.

For the remainder of the pass, downlink data from on-board storage according to the EIR-OPS-011: Downlink Data From Storage procedure.

Warning

Section B of EIR-OPS-011: Downlink Data From Storage (i.e. Tx Convolutional Encoding Management) SHOULD NOT BE FOLLOWED during the initial passes of the mission, until the GS team confirm that 2-way communications is sufficiently stable.

For the first pass of initial AOS and given the current/off-nominal status of the mission, it is recommended that the Operator downlink data according to the priorities listed in the table below.
For all subsequent passes, some NEW rows of Event and HK data should always take priority to assess the current state of the spacecraft. Only then should the priorities in the table be re-assumed.

Note

In this table the Operator is advised to downlink ‘some’ rows of a particular data type and whether the OLDest or NEWest rows of data in storage should be given preference. This is done with the assumption that the time constraint of the communication pass will not allow the Operator to get all the desired data downlinked in a single pass.

Important

Only MRAM channels can be accessed while in failsafe. MRAM storage has been configured to have 1) channels for logging data generated while operating in failsafe and 2) channels containing a buffer of the most recently logged data by a primary image. Refer to ROW to determine what channels exist in MRAM.

Warning

absRowsLogged is maintained by the logger components. Therefore, no absRowsLogged data will be available for the primary image MRAM channels while operating in failsafe. Therefore, if wanting to request data from these primary image MRAM channels, the Operator should instead use the channels’ numRows when calculating First row and Last row for the downlink.

Warning

numRows should also be used when downlinking ADM logger data.

Priority	What Data?	Why?
Highest	Some NEW rows of FAILSAFE `HK`	to determine the current state of the spacecraft
	Some OLD rows of FAILSAFE `Event`	may provide useful in-sight into the nature of the reboot(s) that led to failsafe
	Some OLD rows (at least ~hours worth) of `ADM`	to assess if antenna deployment occurred during the first burn attempts
	Some NEW rows of PRIMARY `HK`	to determine the state of the spacecraft when last in primary1
Lowest	Some NEW rows of PRIMARY `Event`	may provide useful in-sight into why reboot(s) occurred while in primary1

A.2.

If a communication pass is over proceed to Section B, however, for later passes and while…
- Antenna deployment is still being confirmed, or
- The reason for failsafe is still being assessed
…the Operator should return to this section and continue to downlink data from the above table as well as any additional data desired as a result of the analysis carried out in Section B.
See Steps B.4 and B.9 to assess if/when Sections C and D, respectively, should be followed.

B. Data Analysis (After the Communication Pass)

Note

The analysis to be carried out by the team is very dependent on the findings as well as what data was successfully downlinked in Section A. Therefore, rather than a strict set of instructions, this section instead provides information to help guide the Operator in their analyses. Also note that in addition to any data downlinked by the UCD GS, data obtained via the amateur radio community may also be used to support the analysis/findings.

SPACECRAFT HEALTH CHECK

B.1.

Any ‘NEW rows of FAILSAFE HK data’ downlinked should now be checked to assess the current state of the spacecraft and its subsystems. Other than the fact that failsafe is the current boot image, do the other HK parameters cause any reason for concern? e.g:
- Are the battery bus voltage levels nominal?
- Are the various EPS and/or battery reset counters as expected given their pre-launch values?
- Has the temperature of the CMC Power Amplifier stayed within expected/acceptable limits since RF transmissions were enabled?

Tip

This information should be used to assist with the ‘FAILSAFE BOOTED ANALYSIS’ below.

Tip

In addition to the most recent value of each parameter, check how the values changed with time. Use the Grafana to help with this.

B.2.

The Operator should also assess whether the failsafe image has been stable since booted. To do this:
- If the full failsafe Event log has been downlinked, search it for occurrences of the Separation Sequence ‘StateFunctionComplete’ event with event data = 0x00 (i.e. the Separation Sequence Init State). If failsafe has been stable since booted, only one event with data = 0x00 should be observed.
- If the full Event log has NOT YET been downlinked but some ‘NEW rows of FAILSAFE HK data’ and some ‘NEW rows of PRIMARY HK data’ were retrieved:
  - Use the most recent On-Board Time (OBT) and uptime parameter values in the ‘NEW rows of FAILSAFE HK data’ to determine the OBT of the last reboot.
  - If this OBT is roughly consistent with the last OBT parameter value in the ‘NEW rows of PRIMARY HK data’, then failsafe has likely been stable since booted.
If multiple reboots have occurred since failsafe has been booted, the Operator should investigate this in parallel to the below analysis, which is more focused on the nature of the reboots that led to failsafe as opposed to reboots while operating in failsafe. However, the same analysis largely applies and should be considered prior to proceeding to Section D.

2-WAY COMMUNICATION CONFIRMATION

B.3.

The downlinked data should now be assessed to confirm with confidence, that:
1. full antenna deployment has occurred, and
2. nominal 2-way communication have been achieved.
To do this, the following should be considered:
- Does the downlinked Event log (i.e. the ‘OLD rows of FAILSAFE Event data’) suggest that the Separation Sequence successfully progressed to and through the different burn and between-burn-wait states (i.e. are Separation Sequence ‘StateFunctionComplete’ events observed with Event Data = the IDs of the burn and between-burn-wait states)?
  
  Tip
  
  To aid this assessment, the Operator can review Event log data downlinked during the MMTs here for comparison with their data.
- In the ‘OLD rows of ADM data’:
  - Do the ADM switch states, read by both the OBC and the EMOD MSP (i.e. mission.SeparationSequence.AntSwitchesStatuses and platform.ADM.SwitchesStatuses ), indicate that the antenna elements have been deployed?
  - Do the deployment times of the different elements coincide with the resistor burns?
  - Do the PDM currents show that the correct current went through the resistors for the correct amount of time during the resistor burns?
- In the downlinked HK data:
  - Check that the temperature of the CMC Power Amplifier increased only after RF transmissions were enabled to confirm that RF transmissions enabled when expected.

B.4.

When the team are satisfied that all antenna elements are fully deployed and that 2-way communications are stable, during the next communication pass, Section C should be carried out.

FAILSAFE BOOTED ANALYSIS

B.6.

The Operator should first assess the time-line of the reboot(s) that led to failsafe. To do this, take note of the most recent core.OBT.uptime in the ‘NEW rows of PRIMARY HK data’, and consider the following possibilities:
- If there are no rows of ‘PRIMARY HK data’ available for downlink, a failed attempt to boot the primary image at start-up likely led to failsafe being booted. This theory is supported if there are also no rows of ‘PRIMARY Event data’.
  
  If this is the case, the Operator should now consider what might have prevented a successful boot. The remaining steps in this section are not very applicable to this analysis and so, the Software Engineer should be contacted for support.
- If this core.OBT.uptime is >2 hours, failsafe was booted as a result of:
  - A reboot + a failed attempt to boot back into the previously operating primary image, or
  - A reboot where the primary image was not marked as stable even though >2 hours of operating in the image had passed.
  Both scenarios require an assessment of the initial reboot. Additionally, however, both scenarios also require some anomalous/unexpected (software?) behaviour. Therefore, if either scenario has occurred, the Software Engineer should be contacted for support.
- If this core.OBT.uptime is <2 hours AND >2 hours had elapsed since on-orbit deployment, failsafe was booted as a result of more than one reboot sometime after launch, where the first rebooted the primary image.
  
  In this case, ‘NEW rows of PRIMARY HK data’ and ‘NEW rows of PRIMARY Event data’ should be searched for further evidence of the first reboot into the primary image (e.g. did uptime reset?, are there multiple occurrences of the Separation Sequence StateFunctionComplete event with event data = 0x00?).
- If this core.OBT.uptime is <2 hours AND <2 hours had elapsed since on-orbit deployment, failsafe was booted as a result of a single reboot sometime after launch.

B.7.

To determine the nature of any reboots identified in the previous step, the Operator should now search the Event logs (i.e. the ‘OLD rows of FAILSAFE Event data’ and ‘NEW rows of PRIMARY Event data’) for ‘EPSInitialised’ events around the times of the reboots.
If this event is observed, a full spacecraft power-cycle led to the reboot.
Else, an OBC reset occurred.

B.8.

If a full spacecraft power-cycle occurred, the Operator should now assess the ‘NEW rows of PRIMARY HK data’ and ‘NEW rows of PRIMARY Event data’ to determine if there is evidence that low battery conditions caused the reboot(s). In particular, the Operator should:
- Search the HK data for a decrease in the battery bus voltage to ~6.144V, and
- Search the Event log for the ‘LowVoltageExceptionBATSafe’ event.
If evidence that low battery conditions caused the reboot(s) is found, the Operator should now consider using the EIR-OPS-026: Low Battery Fault Analysis procedure to assist further analysis.
If a full spacecraft power-cycle did not occur OR if a power-cycle did occur but there is no evidence of low battery issues, the Operator should now consider using the EIR-OPS-027: Reboot Fault Analysis procedure to assist further analysis.

B.9.

When the team have completed their analysis and wish to leave the failsafe image, Section D should be carried out.

C. Finishing the Separation Sequence

C.1.

Invoke the mission.SeparationSequence.SeparationSequenceFinish action.

TC Details
MCS Operation	`Invoke`
Action/Param Name	`mission.SeparationSequence.SeparationSequenceFinish`
Data Expected with TC	No
TM Details
Data Expected from TC	No ( + ACK )

C.2.

Get the mission.SeparationSequence.state parameter.
Ensure that the returned state is 0x42 (hex) / 66 (dec).

TC Details
MCS Operation	`Get`
Action/Param Name	`mission.SeparationSequence.state`
Data Expected with TC	No
TM Details
Data Expected from TC	`state` ( + ACK )
Data Size	1 byte
Data Info	the current state of the Separation Sequence
Allowed Value(s)	00 - 09 or 42 (hex)
Expected Value(s)	42 (hex) / 66 (dec)

C.3.

On exit of the Separation Sequence, all PDMs should be powered OFF. To confirm this, Get the platform.EPS.actualSwitchStates parameter with First row = 0 and Last row = 9.
Ensure that all 0s (excluding row 7/PDM 8) are returned.

Caution

The FSS is drawing parasitic power on row 7/PDM 8 of EPS.actualSwitchStates and so will always be returned as 1 (ON), even if the state of PDM8/Row7 of EPS.expectedSwitchStates is set to 0 (OFF).

TC Details
MCS Operation	`Get`
Action/Param Name	`platform.EPS.actualSwitchStates`
Data Expected with TC	Yes
Data Size	2 bytes, 2 bytes
Data Info	`First row`, `Last row`
Allowed Value(s)	0-9, 0-9
Expected Value(s)	0, 9
TM Details
Data Expected from TC	List of switch states ( + ACK )
Data Size	List[0:10] of Booleans
Data Info	If all 0, all PDMs are off
Allowed Value(s)	0000000000 (all PDMs OFF) - 1111111111 (all PDMs ON)
Expected Value(s)	0000000100 (all PDMs OFF, except for the FSS PDM/PDM 8)

D. Booting Primary

Warning

This section of the procedure should ONLY be carried out following the close-out of Sections B and C, and ONLY IF the decision has been made to proceed with booting into a primary image.

D.1.

The Operator should now follow the EIR-OPS-024: Boot Into OBC Image procedure to boot the primary image of choice (i.e. primary1 or primary2).
If the primary image is successfully booted and is stable (i.e. no reboots to failsafe), the Operator can begin the EIR-OPS-006: Commissioning Procedure.

END OF PROCEDURE