Federated Outlier Detection for Astronomical Data: Performance Analysis on Commercial Clouds

  • Camila Lopes UFF
  • Wesley Ferreira UFF
  • Julia Gschwend LIneA
  • Luiz Nicolaci da Costa LIneA
  • Rafael Ferreira da Silva Oak Ridge National Laboratory
  • Marta Mattoso UFRJ
  • Aline Paes UFF
  • Daniel de Oliveira UFF

Resumo


Astronomical surveys such as the Dark Energy Survey (DES) and the upcoming Legacy Survey of Space and Time (LSST) produce massive volumes of observational data, demanding scalable and efficient data analysis techniques. Among these, Machine Learning (ML) has become a key tool for extracting patterns and detecting anomalies in large astronomical catalogs. However, traditional centralized ML approaches are impractical in this scenario, due to data transfer bottlenecks, storage constraints, and privacy concerns. Federated Learning (FL) offers a decentralized alternative by training models across distributed data sources, reducing transfer costs and preserving data locality. However, configuring and deploying FL workflows is challenging due to client heterogeneity and data distribution. This paper explores the use of FL for outlier detection in large astronomical catalogs, using DES as a proxy for LSST. We emulate FL deployments on Amazon Web Services (AWS) cloud, evaluating various configurations of compute resources. Our results evidence the trade-offs between training time and financial cost, providing insights into the configuration of FL workflows for large-scale LSST data.

Referências

Abbott, T., Abdalla, F., et al. (2018). Dark energy survey year 1 results: cosmological constraints from galaxy clustering and weak lensing. Physical Review D, 98(4).

Abbott, T. M. C., Adamów, M., et al. (2021). The dark energy survey data release 2. The Astrophysical Journal Supplement Series, 255(2):20.

Aljunaid, S. a. (2025). Secure and transparent banking: Explainable ai-driven federated learning model for financial fraud detection. J. of Risk and Financial Management, 18:26.

Alves, A., Carruba, V., et al. (2025). Deep learning identification of asteroids interacting with g-s secular resonances. Planetary and Space Science, 258:106062.

Bertin, E. and Arnouts, S. (1996). SExtractor: Software for source extraction. , 117:393–404.

Beutel, D. J., Topal, T., Mathur, A., Qiu, X., Parcollet, T., de Gusmão, P. P., and Lane, N. D. (2020). Flower: A Friendly Federated Learning Research Framework. arXiv.

Casaletto, J., Mackintosh, G., and others. (2022). Using federated learning to overcome data gravity in space. NASA Technical Reports Server (NTRS). Technical Report.

Covey, K. R., Ivezić, Ž., et al. (2007). Stellar SEDs from 0.3 to 2.5 µm: Tracing the Stellar Locus and Searching for Color Outliers in the SDSS and 2MASS. , 134(6):2398–2417.

Cui, L. et al. (2022). Security and privacy-enhanced federated learning for anomaly detection in iot infrastructures. IEEE Transactions on Industrial Informatics, 18(5):3492–3500.

D’Isanto, A., Cavuoti, S., et al. (2016). An analysis of feature relevance in the classification of astronomical transients with machine learning methods. Monthly Notices of the Royal Astronomical Society, 457(3):3119–3132.

Dong, F., Leung, H., and Drew, S. (2024). Navigating high-degree heterogeneity: Federated learning in aerial and space networks.

Ferreira, W., Kunstmann, L., et al. (2024). Akôflow: um middleware para execução de workflows científicos em múltiplos ambientes conteinerizados. In XXXIX SBBD, pages 27–39, Florianópolis/S. SBC.

Fukugita, M., Ichikawa, T., Gunn, J. E., Doi, M., Shimasaku, K., and Schneider, D. P. (1996). The Sloan Digital Sky Survey Photometric System. , 111:1748.

Herschel, M., Diestelkämper, R., and Ben Lahmar, H. (2017). A survey on provenance: What for? what form? what from? VLDB J., 26(6):881–906.

Ivezić, , Kahn, S. M., et al. (2019). Lsst: From science drivers to reference design and anticipated data products. The Astrophysical Journal, 873(2):111.

Jurić, M., Kantor, J., Lim, K., Lupton, R. H., et al. (2015). The lsst data management system. arXiv preprint arXiv:1512.07914.

Kron, R. G. (1980). Photometry of a complete sample of faint galaxies. , 43:305–325.

Laridi, S. et al. (2024). Enhanced federated anomaly detection through autoencoders using summary statistics-based thresholding. Scientific Reports, 14(1):26704.

Lopes, C., Nunes, A. L., Boeres, C., et al. (2023). Provenance-based dynamic fine-tuning of cross-silo federated learning. In 10th CARLA, volume 1887, pages 113–127. Springer.

MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations.

Mahabal, A., Sheth, K., Gieseke, F., et al. (2017). Deep-learnt classification of light curves. In 2017 IEEE Symposium Series on Computational Intelligence (SSCI), pages 1–8.

McMahan, B., Moore, E., Ramage, D., et al. (2017). Communication-Efficient Learning of Deep Networks from Decentralized Data. In AISTATS’17, volume 54, pages 1273–1282.

Mickaelian, A. (2016). Astronomical surveys and big data. Open Astronomy, 25(1):75–88.

Nair, D. G., Aswartha Narayana, C. V., et al. (2022). Exploring SVM for Federated Machine Learning Applications. In Proc. of the ICADCML 2022, pages 295–305, Singapore.

Razmi, N., Matthiesen, B., Dekorsy, A., and Popovski, P. (2024). Energy-aware federated learning in satellite constellations.

Roth, H. R., Cheng, Y., Wen, Y., et al. (2023). NVIDIA FLARE: federated learning from simulation to real-world. IEEE Data Eng. Bull., 46(1):170–184.

Sattler, F. et al. (2019). Clustered federated learning: Model-agnostic distributed multitask optimization under privacy constraints. IEEE TNNLS, 32:3710–3722.

Savić, D. V., Jankov, I., et al. (2023). The lsst agn data challenge: Selection methods.

Yang, Q., Liu, Y., Chen, T., and Tong, Y. (2019). Federated Machine Learning: Concept and Applications. ACM Trans. Intell. Syst. Technol., 10(2).
Publicado
28/10/2025
LOPES, Camila; FERREIRA, Wesley; GSCHWEND, Julia; COSTA, Luiz Nicolaci da; SILVA, Rafael Ferreira da; MATTOSO, Marta; PAES, Aline; OLIVEIRA, Daniel de. Federated Outlier Detection for Astronomical Data: Performance Analysis on Commercial Clouds. In: SIMPÓSIO EM SISTEMAS COMPUTACIONAIS DE ALTO DESEMPENHO (SSCAD), 26. , 2025, Bonito/MS. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 362-373. DOI: https://doi.org/10.5753/sscad.2025.16731.