In 2023, EGI CSIRT has significantly increased the efforts to strengthen global collaboration. We recognise the value of shared knowledge and unified defence strategies. Our engagement has expanded beyond the Worldwide LHC Computing Grid (WLCG) organisations, including OSG, US-CMS, and others. We have also engaged with a wider range of research and education institutions as well as National Research and Education Networks (NRENs). These strategic partnerships have been instrumental in improving our ability to respond rapidly to emerging threats, thereby enhancing the overall security landscape.

Photo by pixabay

Last year’s incidents have tested the readiness and response mechanisms of the EGI Incident Response Taskforce (IRTF) and have also provided invaluable insights into improving cybersecurity measures at EGI and WLCG. By analysing these anonymised incidents, this report will demonstrate the dynamic nature of cyber threats and the ongoing commitment of the IRTF team to improving the security posture of the grid computing. This analysis will provide an understanding of the challenges faced, which will guide our goals and strategic planning for 2024. This report is not just about effectively managing security incidents. It is about defining a more resilient and secure EGI ecosystem that benefits all partners.

Categorization of Security Incidents

  1. Credential Leakage:
    • Incidents summary: Several instances where sensitive credentials and API details were exposed. One notable incident involved a publication of the IAM credentials at a public GitLab repository, leading to potential unauthorised job submissions.
    • Description: An operator discovered a commit in a public GitLab repository containing sensitive IAM credentials and API URLs. These credentials could be used to obtain a high-privilege IAM token, potentially allowing an attacker to submit jobs directly to any site. While the attack surface was limited to computing resources, a worst-case scenario would have involved lateral movement and payload delivery (crypto-mining, ransomware, …). However, the attacker would have needed extensive knowledge of grid computing to execute these actions, making a successful attack unlikely. Immediate action was taken to revoke and secure the credentials and monitor the infrastructure.
    • Lessons Learned:
      • Use private repositories to minimise the impact of similar incidents.
      • Implement effective secret scanning with appropriate Regex rules.
      • Enhance traceability within systems to facilitate incident response.
      • Incorporate mechanisms for revoking tokens and eliminating potentially compromised jobs.
      • Distribute privileges across multiple credentials rather than relying on a single one.
  2. Unauthorised Access:
    • Incidents: A compromised user credentials, primarily via SSH brute force attacks, led to cryptomining and potential DDoS attacks. This incident underlined the interconnected vulnerabilities across organisations.
    • Description: Unauthorised access to a university’s infrastructure was gained through compromised user credentials, primarily through SSH brute force attacks. The attacker used a combination of automated and manual malware, including self-written bash scripts and well-known open-source malware, to establish connections, create persistence mechanisms, escalate privileges, and move laterally across the network. The ultimate objective was to deploy cryptomining and prepare for distributed denial-of-service (DDoS) attacks. Notably, the incident revealed similar Indicators of Compromise (IOCs) accross multiple institutions, suggesting a wider impact. The spread was rapid as other systems were accessed reusing credentials and thus chaining attacks. Two distinct groups with similar Tactics, Techniques, and Procedures (TTPs) were involved, resulting in approximately 20 organisations and over two hundred Linux servers being affected.
    • Impact: Affected approximately 20 organisations, compromising over 200 servers.
    • Response: Immediate revocation of compromised credentials, reinstallation of the computing resources and network filtering.
    • Lessons Learned:
      • Communication:
        • Use straightforward and simple language in all communications specially if there are language barriers.
        • Break down information into short, digestible sentences.
        • Be explicit and avoid ambiguity
      • Infrastructure:
        • Do not take actions before contacting a security team that can provide guidance to prevent the loss of evidence.
        • Provide unique accounts to admins (don’t reuse or share accounts).
        • Disable remote SSH password login to the servers, prefer using SSH keys. If possible use MFA for SSH login.
        • Blacklist IPs after multiple failed logins (like using fail2ban).
        • Lock accounts after multiple failed attempts (like using PAM system-auth).
        • Ensure that private keys (SSH, X.509,…) are not stored unencrypted (i.e. they should be password protected). Ideally, no private keys should be stored on publicly accessible servers (such as user interfaces or login nodes).
        • Monitor patch status to avoid vulnerable systems.
        • Use network segments to detach critical services from the others.
        • Enable central remote logging.
  3. Impersonation and Identity Fraud:
    • Incidents: A significant event was the impersonation of a university alumnus, utilizing in-depth knowledge of EGI and EOSC services for suspicious resource requests.
    • Description: Two requests submitted for EGI resources raised suspicions as VO administrators found no valid backing from the mentioned projects or individuals. Further investigation confirmed that the identity used to create these requests had been compromised. The actor behind this compromise showed remarkable activity, frequently engaging with various resource providers and demonstrating a good understanding of the EGI, EOSC, and related services landscape. They communicated fluently in English and Czech, used Proton VPN for access, and demonstrated knowledge of relationships between organisations involved in project activities.
    • Response: Strengthened identity verification processes and continuous monitoring of resource access.
    • Lessons Learned: The importance of robust identity verification and the need for ongoing surveillance of resource utilisation to prevent impersonation.
  4. Exploitation of Software Vulnerabilities:
    • Incidents: An unprotected and unpatched Apache NiFi instance was exploited, allowing unauthorised access and cryptomining.
    • Description: An attacker exploited an unprotected and unpatched instance of Apache NiFi, formerly known as “Niagara Files”, to compromise a research and education (R&E) organisation. Apache NiFi is an open-source data integration tool designed to automate data flow between different systems. By uploading a malicious processor, the attacker gained unauthorised access to the server and executed arbitrary code, allowing malware to be deployed. The attacker used various tools obtained from remote hosts to escalate privileges and establish persistence. They used a combination of scripts and binaries, using both customised and open source tools. The primary objective of this attack was to run cryptomining.
    • Response: Implementation of secure service configurations, enforcement of authentication, and prompt software updates.
    • Lessons Learned:
      • Configure services with security in mind, exposing only essentials.
      • Enforce mandatory authentication for all services.
      • Keep software updated for security patches.
      • Use an IDS to monitor and detect threats.
      • Implement strict firewall rules for trusted access control.

Recommendations

The incidents of 2023 have led to the development of guidelines and training programs aimed at enhancing the cybersecurity knowledge and capabilities of EGI sites’ security experts and administrators. These initiatives include:

  • Documentation and Training: The creation of detailed documentation, such as the “Server Management Guidelines” available on the EGI Confluence page (https://confluence.egi.eu/display/EGIBG/Server+management+guidelines), to assist in securely configuring infrastructures and performing forensic investigations.
  • Proactive Security Measures: We promote the adoption of advanced security tools, provide presentations on best practices, and offer consultations for security-related questions.
  • Community Engagement: Strengthening the EGI community’s collaboration and information sharing to collectively improve the cybersecurity posture across all member organisations.

Photo by iStockPhoto

To conclude the cybersecurity environment of 2023 has reinforced the importance of detection, traceability, and proactive engagement in addressing cyber threats. By incorporating the lessons learned from the past year’s incidents into our strategies and operational practices, EGI is committed to significantly enhancing its resilience against future cybersecurity challenges, ensuring the security and integrity of its critical infrastructures and services.

It would appear that some of the handled incidents suggest that the secure service operation maturity may have declined somewhat over recent years, particularly at the smaller resource centres. In response, the EGI CSIRT is pleased to be able to provide supporting material, such as technical guidance, to address this issue.