Commit automatico: D2.2 Update of the Operations Management Policy (OMP) of RESILIENCE
|
|
@ -0,0 +1,57 @@
|
|||
---
|
||||
config:
|
||||
layout: elk
|
||||
---
|
||||
flowchart TD
|
||||
U[["Users and requestors"]] --> A@{ label: "OMSP Service Desk<br><span style=\"font-size:12px\">Single point of contact · triage · ticketing</span>" }
|
||||
A -- Log and acknowledge --> B@{ label: "IT Support Office<br><span style=\"font-size:12px\">Daily operations · monitoring · fix</span>" }
|
||||
B1@{ label: "Operations Administrator<br><span style=\"font-size:12px\">Member of ITSO</span>" } --> B
|
||||
B -- Ops triage --> DX{{"Infrastructure platform issue"}}
|
||||
DX -- Yes --> D@{ label: "D4Science Support Team<br><span style=\"font-size:12px\">VRE · StorageHub · gCat · SocialService · CCP · IAM</span>" }
|
||||
DX -- No --> DY{{"External integrated service issue"}}
|
||||
DY -- Yes --> E@{ label: "External Service Provider<br><span style=\"font-size:12px\">Local admin for integrated apps</span>" }
|
||||
DY -- No --> DZ{{"Product or content or feature request"}}
|
||||
DZ -- Yes --> C@{ label: "Product Owner or Project Leader<br><span style=\"font-size:12px\">Backlog · UAT · documentation</span>" }
|
||||
DZ -- No --> B
|
||||
D -- Restore or workaround or RCA --> B
|
||||
E -- Fix or configuration --> B
|
||||
C -- Backlog and acceptance --> B
|
||||
B -- Assess impact and SLOs --> DP{{"Priority one or SLO breach or security incident"}}
|
||||
DP -- Yes --> F@{ label: "CTO – Chief Technical Officer<br><span style=\"font-size:12px\">Tactical decisions · risk and SLO ownership</span>" }
|
||||
DP -- No --> B
|
||||
F -- Coordination and approvals --> DM{{"Strategic impact or major change"}}
|
||||
DM -- Yes --> G@{ label: "Board<br><span style=\"font-size:12px\">Strategic oversight · major approvals</span>" }
|
||||
DM -- No --> B
|
||||
G -- Policy and direction --> F
|
||||
F -- Directives and standards --> B
|
||||
A -- Acknowledgements and updates --> N1[("User and stakeholder updates")]
|
||||
B -- Status and closure --> N1
|
||||
F -- Major incident communications --> N1
|
||||
G -- Executive communications --> N1
|
||||
A@{ shape: subroutine}
|
||||
B@{ shape: subroutine}
|
||||
B1@{ shape: rect}
|
||||
F@{ shape: subroutine}
|
||||
G@{ shape: subroutine}
|
||||
D@{ shape: subroutine}
|
||||
E@{ shape: subroutine}
|
||||
C@{ shape: subroutine}
|
||||
U:::notify
|
||||
A:::ext
|
||||
B:::int
|
||||
B1:::int
|
||||
DX:::decision
|
||||
DY:::decision
|
||||
DZ:::decision
|
||||
DP:::decision
|
||||
DM:::decision
|
||||
F:::int
|
||||
G:::int
|
||||
D:::ext
|
||||
E:::ext
|
||||
C:::int
|
||||
N1:::notify
|
||||
classDef int fill:#e7f5ff,stroke:#1c7ed6,stroke-width:1px,color:#1c7ed6
|
||||
classDef ext fill:#fff4e6,stroke:#d9480f,stroke-width:1px,color:#d9480f
|
||||
classDef decision fill:#ffffff,stroke:#495057,stroke-dasharray:3 3,color:#495057
|
||||
classDef notify fill:#f8f9fa,stroke:#adb5bd,color:#495057,stroke-dasharray:2 2
|
||||
|
|
@ -0,0 +1,53 @@
|
|||
---
|
||||
config:
|
||||
layout: elk
|
||||
---
|
||||
flowchart TD
|
||||
U["User community"] --> SD[["Service Desk and Support"]]
|
||||
MON[["Monitoring and Control Service<br>Event Management"]] -- alerts and events --> SD & INC[["Incident Management"]]
|
||||
SD -- log and classify --> D1{"Is this an incident"}
|
||||
D1 -- Yes --> INC
|
||||
D1 -- No --> D2{"Is this an access request"}
|
||||
D2 -- Yes --> ACC[["Access Management"]]
|
||||
D2 -- No --> REQ[["Request Fulfilment"]]
|
||||
INC -- diagnose and route --> D3{"Infrastructure or application"}
|
||||
D3 -- Infrastructure --> TECH[["Technical Management"]]
|
||||
D3 -- Application --> APP[["Application Management"]]
|
||||
TECH -- work orders and fixes --> OPS[["IT Operations Management"]]
|
||||
APP -- fixes and releases --> OPS
|
||||
OPS -- logs and status --> INC
|
||||
TECH -- diagnostics and fixes --> INC
|
||||
APP -- diagnostics and fixes --> INC
|
||||
INC -- restore service --> SD
|
||||
SD -- user communication and closure --> U
|
||||
APP -- engage external teams --> EXT[["External Service Integration"]]
|
||||
EXT -- third party actions --> APP
|
||||
INC -- recurring or unknown root cause --> PM[["Problem Management"]]
|
||||
PM -- root cause and permanent fix --> TECH & APP
|
||||
PM -- publish work arounds --> KEDB[["Known error database"]]
|
||||
KEDB --> SD & INC
|
||||
PM -- update thresholds and correlations --> MON
|
||||
ACC -- grant or revoke roles --> REQ
|
||||
ACC -- security breach notifications --> INC
|
||||
OPS -- telemetry and logs --> MON
|
||||
TECH -- infrastructure metrics --> MON
|
||||
APP -- application metrics --> MON
|
||||
U:::aux
|
||||
SD:::svc
|
||||
MON:::svc
|
||||
INC:::svc
|
||||
D1:::decide
|
||||
D2:::decide
|
||||
ACC:::svc
|
||||
REQ:::svc
|
||||
D3:::decide
|
||||
TECH:::svc
|
||||
APP:::svc
|
||||
OPS:::svc
|
||||
EXT:::ext
|
||||
PM:::svc
|
||||
KEDB:::aux
|
||||
classDef svc fill:#eef6ff,stroke:#1d4ed8,stroke-width:1px,color:#0f172a
|
||||
classDef decide fill:#ffffff,stroke:#64748b,stroke-dasharray:3 3,color:#0f172a
|
||||
classDef aux fill:#f8fafc,stroke:#94a3b8,color:#334155
|
||||
classDef ext fill:#fff7ed,stroke:#f97316,color:#7c2d12
|
||||
|
After Width: | Height: | Size: 332 KiB |
|
After Width: | Height: | Size: 156 KiB |
|
|
@ -0,0 +1,31 @@
|
|||
sequenceDiagram
|
||||
autonumber
|
||||
participant User as End User
|
||||
participant L1 as OMSP Service Desk (L1)
|
||||
participant AP as IT Service/Application Provider (L2/Service)
|
||||
participant D4S as D4Science Support Team (L2/Infra)
|
||||
participant PL as Project Leader(L2/Project)
|
||||
participant CTO as Chief Technical Officer
|
||||
|
||||
User->>L1: Submit request (incident/defect/info/project)
|
||||
L1->>L1: Log ticket, ACK to user (ID assigned)
|
||||
L1->>L1: Classify (infra vs app vs info vs project)
|
||||
|
||||
alt Infrastructure issue
|
||||
L1->>D4S: Escalate with priority, evidence, context
|
||||
D4S-->>L1: Diagnostic update / Fix / Workaround
|
||||
else Application defect
|
||||
L1->>AP: Create bug record + assign (warranty/maintenance)
|
||||
AP-->>L1: Fix/Workaround + resolution notes
|
||||
else Information request
|
||||
L1->>L1: Resolve via KB/docs or SME
|
||||
else Project-related request
|
||||
L1->>PL: Redirect with full context
|
||||
PL-->>L1: Guidance / Action / Next steps
|
||||
end
|
||||
|
||||
L1-->>User: Status updates / Resolution note
|
||||
opt Major/High Impact or Policy/SLA issue
|
||||
L1->>CTO: Notify & summarize (for governance)
|
||||
CTO-->>L1: Direction / Escalation policy
|
||||
end
|
||||
|
After Width: | Height: | Size: 456 KiB |
|
After Width: | Height: | Size: 122 KiB |
|
|
@ -0,0 +1,64 @@
|
|||
sequenceDiagram
|
||||
autonumber
|
||||
%% === Actors ===
|
||||
participant PL as PO/PL (Requestor)
|
||||
participant OMSP as OMSP-OA (Ops Admin)
|
||||
participant CTO as Chief Technical Officer (CAB)
|
||||
participant ITSO as IT Support Office
|
||||
participant D4S as D4S-ST (VRE/Core Infra)
|
||||
participant ESP as ESP (External App Provider)
|
||||
participant UAT as UAT Testers (PO/PL Team)
|
||||
participant MON as OMSP (Monitoring/Observability)
|
||||
|
||||
%% 1) Intake & Classification (Service Request Mgmt)
|
||||
PL->>OMSP: Service Request / RFC (scope, NFRs, AAI/IAM, SLOs)
|
||||
OMSP-->>PL: ACK + Ticket ID
|
||||
OMSP->>CTO: Register RFC & visibility
|
||||
OMSP->>ITSO: Share intake context (for coordination)
|
||||
|
||||
%% 2) Feasibility & Risk (Change Enablement / Design Coord)
|
||||
OMSP->>D4S: Feasibility (new vs existing VRE, core services, capacity)
|
||||
OMSP->>ESP: Compatibility & integration constraints (if applicable)
|
||||
OMSP->>OMSP: Feasibility Note + Risk Log + IAM/RBAC outline
|
||||
OMSP-->>CTO: Submit feasibility package (Go/Adjust/No-Go)
|
||||
|
||||
%% 3) Plan & Approval (Change Enablement / CAB)
|
||||
CTO-->>OMSP: CAB decision (approve/adjust)
|
||||
OMSP->>OMSP: Deployment Plan (envs, UAT, cutover/rollback, monitoring, backup, comms)
|
||||
OMSP->>ITSO: Align comms templates, inventory placeholders, runbook skeleton
|
||||
|
||||
%% 4) Build & Integrate (Release & Deployment)
|
||||
alt New VRE
|
||||
OMSP->>D4S: Provision VRE + core (StorageHub, gCat, CCP, SocialService)
|
||||
else Existing VRE
|
||||
OMSP->>D4S: Extend VRE (resources, policies, quotas)
|
||||
end
|
||||
OMSP->>OMSP: Configure IAM (AAI/OIDC), RBAC, ELK logs, Prom/Grafana alerts
|
||||
opt External integration required
|
||||
OMSP->>ESP: API/connector setup, secrets, mapping, routing
|
||||
end
|
||||
OMSP->>ITSO: Draft/Update runbook & KB
|
||||
note over OMSP,D4S: Build complete, integration smoke tests pass
|
||||
|
||||
%% 5) Functional & Non-functional Testing (Service Validation & Testing)
|
||||
OMSP->>OMSP: Functional, security, basic performance checks
|
||||
OMSP->>D4S: Validate backup/restore & gCat metadata mapping
|
||||
OMSP-->>CTO: Test summary – UAT readiness
|
||||
|
||||
%% 6) User Acceptance Testing (Service Validation & Testing)
|
||||
OMSP->>UAT: UAT kickoff (test scripts, data)
|
||||
UAT->>OMSP: Defects/feedback
|
||||
OMSP->>OMSP: Fix & retest cycles
|
||||
UAT-->>OMSP: UAT Sign-off
|
||||
|
||||
%% 7) Production Cutover (Release & Deployment)
|
||||
OMSP->>PL: Maintenance window notice (≥24h) + user comms
|
||||
OMSP->>D4S: Execute cutover + smoke tests
|
||||
OMSP->>ITSO: Publish Service Catalogue entry & SLOs
|
||||
OMSP-->>PL: Go-live confirmation & support path
|
||||
|
||||
%% 8) Early Life Support & Handover (Service Operation)
|
||||
MON->>OMSP: Heightened monitoring & alert verification
|
||||
OMSP->>ITSO: Finalize runbook/KB, capture early-life metrics
|
||||
OMSP-->>CTO: Early-life summary, schedule first service review
|
||||
OMSP-->>PL: Ticket closure (links to docs, SLAs, escalation paths)
|
||||
|
After Width: | Height: | Size: 135 KiB |
|
After Width: | Height: | Size: 626 KiB |
|
|
@ -0,0 +1,54 @@
|
|||
sequenceDiagram
|
||||
autonumber
|
||||
participant Req as Requestor (End User / Project Leader)
|
||||
participant L1 as Service Desk (ITSO/COMSP - L1)
|
||||
participant ITSO as IT Support Office (IAM Ops)
|
||||
participant CTO as Chief Technical Officer (Governance)
|
||||
participant D4S as D4Science IAM/Infra
|
||||
participant EXT as External Service Provider (Integrated App)
|
||||
participant AUD as Monitoring/Audit (ELK)
|
||||
|
||||
%% 1) Intake
|
||||
Req->>L1: Access request / new role / role change
|
||||
L1->>ITSO: Create IAM ticket + full context
|
||||
|
||||
%% 2) Role design & modelling
|
||||
ITSO->>ITSO: Map business role → RBAC / claims (SoD check)
|
||||
%% (5) Returned to Service Desk before approval
|
||||
ITSO-->>L1: Role model package (for approval routing)
|
||||
|
||||
%% 3) Approval (initiated by Service Desk)
|
||||
%% (6) L1 initiates the approval step to CTO
|
||||
L1->>CTO: Submit role model for approval (new roles/sensitive auth)
|
||||
CTO-->>L1: Approve/Adjust
|
||||
|
||||
%% 4) Provisioning (initiated by Service Desk)
|
||||
%% (7,8) L1 triggers provisioning calls
|
||||
L1->>D4S: Create/Update IAM groups (OIDC/RBAC)
|
||||
L1->>EXT: Sync roles/entitlements (if integrated)
|
||||
D4S-->>L1: Provisioning confirmation
|
||||
EXT-->>L1: Sync confirmation
|
||||
L1-->>Req: Access granted notice
|
||||
|
||||
%% 5) De-provisioning (initiated by Service Desk)
|
||||
Req->>L1: Exit / role removal / transfer
|
||||
%% (12) L1 opens de-provisioning work
|
||||
L1->>ITSO: De-provision ticket (record/log)
|
||||
%% (13,14) L1 triggers actual revocation
|
||||
L1->>D4S: Revoke membership / tokens
|
||||
L1->>EXT: Revoke app tokens (≤24h)
|
||||
D4S-->>L1: Revocation confirmation
|
||||
EXT-->>L1: Token revocation confirmation
|
||||
%% (16) L1 informs requester
|
||||
L1-->>Req: Access revoked confirmation
|
||||
|
||||
%% 6) Periodic reviews & compliance (initiated by Service Desk)
|
||||
%% (17) L1 requests data for review
|
||||
L1->>D4S: Extract audit logs & membership lists
|
||||
%% (18) L1 instructs cleanup & recert workflow
|
||||
L1->>ITSO: Quarterly recert + dormant cleanup
|
||||
ITSO-->>L1: Recert/cleanup completion report
|
||||
%% (20) L1 submits compliance report to CTO
|
||||
L1->>CTO: Compliance report + exceptions/remediations
|
||||
CTO-->>L1: Approved corrective actions
|
||||
L1->>AUD: Persist logs & evidence (ELK)
|
||||
|
After Width: | Height: | Size: 476 KiB |
|
After Width: | Height: | Size: 127 KiB |
|
|
@ -0,0 +1,62 @@
|
|||
sequenceDiagram
|
||||
autonumber
|
||||
%% Actors
|
||||
participant MON as Automated Monitoring (ELK + Prom/Grafana)
|
||||
participant OMSP as Monitoring (OMSP)
|
||||
participant U as End User
|
||||
participant L1 as Service Desk (OMSP-L1)
|
||||
participant ITSO as IT Support Office
|
||||
participant D4S as D4Science Support (Infra-L2)
|
||||
participant ESP as External Service Provider (SW-L2)
|
||||
|
||||
%% 1) Health monitoring & observability (ITIL: Event Management)
|
||||
MON-->>L1: Alert: health check/deep probe failure or non-SLO compliance vs SLR
|
||||
U->>L1: Incident ticket (symptoms, impact)
|
||||
L1->>L1: Triage & classify (infra vs software, severity, business impact)
|
||||
L1-->>ITSO: Notification (for coordination and eventuallyescalation)
|
||||
|
||||
%% 2) Incident response (ITIL: Incident Management)
|
||||
alt Infra suspected/confirmed
|
||||
L1->>D4S: Attach logs/metrics, open infrastructure incident (if infra)
|
||||
L1->>D4S: Escalate eventually with priority, evidence, timeframe
|
||||
D4S-->>L1: Diagnostic update / Fix / Workaround
|
||||
else Software malfunction suspected/confirmed
|
||||
L1->>ESP: Attach logs/metrics, open application incident (if app)
|
||||
L1->>ESP: Escalate eventually with priority, replication steps, logs, versions, timeframe
|
||||
ESP-->>L1: Patch/workaround/ETA & notes
|
||||
L1->>L1: Implement workaround if needed, monitor impact
|
||||
end
|
||||
ITSO-->>L1: Resolution summary (restore confirmation/next steps)
|
||||
L1-->>U: Status updates and final resolution note
|
||||
L1->>ITSO: Detailed Incident Information
|
||||
L1-->>ITSO: Post-Incident Review (PIR) with root cause & actions
|
||||
L1->>L1: Update runbook and KB with lessons learned
|
||||
note over OMSP,D4S: Patch/Release Management
|
||||
%% 3) Patch & release management (ITIL: Change Enablement / Release & Deployment)
|
||||
ESP-->>L1: Vendor advisory / patch announcement
|
||||
L1->>L1: Evaluate advisory risk, select candidate patches
|
||||
L1->>L1: Test patches in test/pre-prod environment
|
||||
L1->>ITSO: Draft Release Plan (scope, risk, smoke tests, rollback)
|
||||
L1-->>ITSO: Maintenance notice (≥72h) with scope/impact
|
||||
L1->>U: Maintenance notice (≥24h) with scope/impact
|
||||
L1->>L1: Pre-patch backup & restore test (evidence)
|
||||
L1->>L1: Deploy in approved change window, execute smoke tests
|
||||
L1-->>ITSO: Change closure (success/rollback) + evidence
|
||||
L1-->>U: Completion communication (outcome/next steps)
|
||||
note over OMSP,D4S: Configuration Items/CMDB Documentation
|
||||
%% 4) Configuration & documentation (ITIL: Knowledge Management)
|
||||
L1->>L1: Update CMDB (versions, CIs, relationships)
|
||||
L1->>L1: Update config inventory & user guidance (versioned)
|
||||
L1->>OMSP: Adjust dashboards/alerts thresholds as needed
|
||||
OMSP->>MON: Adjust dashboards/alerts thresholds as needed
|
||||
|
||||
note over OMSP,D4S: Capacity/Availability/Performance Monitoring
|
||||
%% 5) Capacity, performance & cost (ITIL: Capacity Management)
|
||||
OMSP->>MON: Monthly trigger: pull utilization (capacity/availability) trends
|
||||
OMSP->>ITSO: Review hot spots / anomalies (CPU, RAM, I/O, latency, cost)
|
||||
ITSO->>ITSO: Analysis
|
||||
OMSP->>D4S: Request infra tuning / scaling options where needed
|
||||
D4S->>D4S: Implementation of the optimisation plan
|
||||
D4S->>OMSP: Optimisation plan executed
|
||||
OMSP->>OMSP: Testing
|
||||
OMSP->>ITSO: Record scaling plan / optimizations and publish summary
|
||||
|
After Width: | Height: | Size: 702 KiB |
|
After Width: | Height: | Size: 138 KiB |
|
|
@ -0,0 +1,34 @@
|
|||
sequenceDiagram
|
||||
autonumber
|
||||
participant MON as Monitoring/Observability (SLAs/ELK/Prom-Grafana)
|
||||
participant L1 as OMSP - Service Desk (tickets)
|
||||
participant ITSO as IT Support Office
|
||||
participant D4S as D4Science Support (Infra/VRE)
|
||||
participant ESP as External Providers (integrations)
|
||||
participant SM as OMSP - Service Manager
|
||||
participant CTO as CTO
|
||||
|
||||
%% (One-off or yearly) Template agreement
|
||||
SM->>CTO: Propose/agree reporting template (KPIs, risks, escalations, actions)
|
||||
|
||||
%% Monthly data collection
|
||||
MON-->>SM: SLA/availability/MTTR/exported metrics
|
||||
L1-->>SM: Ticket stats (incidents/requests), trends
|
||||
ITSO-->>SM: Problems, changes, PIR highlights, patch outcomes
|
||||
D4S-->>SM: Infra SLAs, capacity notes, major events
|
||||
ESP-->>SM: External dependency incidents/escalations
|
||||
SM->>SM: Normalize & consolidate dataset
|
||||
|
||||
%% Draft report
|
||||
SM->>SM: Draft Monthly Service Report (KPIs, major issues, risks, CSI items)
|
||||
SM->>ITSO: Internal review & factual check
|
||||
ITSO-->>SM: Edits/confirmations
|
||||
|
||||
%% Submission & review
|
||||
SM->>CTO: Submit report (≥5 days before month-end)
|
||||
CTO-->>SM: Acknowledge & share agenda points
|
||||
SM->>CTO: Review meeting (clarify escalations, agree corrective actions)
|
||||
|
||||
%% CSI follow-up
|
||||
SM->>SM: Create/Update CSI register (owners, due dates)
|
||||
SM->>CTO: Circulate meeting minutes & action log
|
||||
|
After Width: | Height: | Size: 352 KiB |
|
After Width: | Height: | Size: 122 KiB |