FL-PRCR: A Preserving Federated Learning Framework for Cross-Institutional Financial Risk Control with Interpretability and Compliance
DOI: https://doi.org/10.62517/jbdc.202601214
Author(s)
Dinglong Li
Affiliation(s)
Newcastle University, Newcastle, UK
Abstract
The financial industry faces a fundamental tension between the imperative for data-driven collaborative risk management and stringent obligations to protect sensitive customer information under regulations such as GDPR and China's Personal Information Protection Law (PIPL). While Federated Learning (FL) offers a promising "data does not move" paradigm for breaking down data silos, its application to cross-institutional financial risk control presents significant challenges, including extreme data heterogeneity, the efficiency trade-off, the lack of model interpretability under constraints, and the absence of trusted governance frameworks for multi-party collaboration. To address these challenges, this paper proposes FL-PRCR, a comprehensive and novel FL framework specifically designed for preserving, Regulatory-compliant Collaborative Risk control in finance. Methodologically, our research employs a hybrid approach combining theoretical modeling, algorithmic innovation, and empirical validation. We first conduct a systematic requirement analysis through interviews with financial institutions and a review of regulatory frameworks, identifying key technical gaps. The core technical contributions and methods are fourfold: (1) We design a dynamic, aware preservation mechanism that classifies financial data into high/medium/low tiers and applies corresponding cryptographic protections: a hybrid Paillier Homomorphic Encryption (HE) + Differential (DP) scheme (ε dynamically tuned between 0.1-1.0) for data, lightweight order-preserving encryption for medium, and standard AES-GCM for low features, achieving an optimal balance between security and computational overhead. (2) To tackle data heterogeneity, we develop a heterogeneity-aware FL optimization algorithm that integrates a federated transfer learning module with a shared feature embedding space E(\theta_E) for cross-institutional feature alignment, coupled with a novel Gradient Contribution Screening (GCS) strategy that filters model updates based on their cosine similarity to a global update direction, reducing communication volume by selectively transmitting only high-value parameters. (3) We construct an interpretable multi-task federated risk model where a primary risk prediction task (implemented via federated XGBoost for credit risk and GNNs for AML) is jointly trained with an auxiliary feature validation task, and we integrate a Federated SHAP module that computes approximate Shapley values for the global model using securely aggregated local expectations, enabling preserving explainability. (4) We architect a regulatory-embedded collaboration framework featuring an additional read-only Regulatory Node for auditable oversight and propose a formal Federated Collaboration Compliance Protocol (FCCP) that defines roles, responsibilities, and data usage boundaries. Empirically, we implement a prototype system based on the FATE framework and conduct rigorous validation using both simulated and real-world data. Our experimental methodology employs: (i) Simulation on public datasets: We partition the LendingClub (500K samples) and Elliptic (200K transactions) datasets to create realistic heterogeneous scenarios mimicking different financial institutions. (ii) Real-world pilot testing: We collaborate with three financial institutions (a city commercial bank, a consumer finance company, and a payment processor) using desensitized real business data (~300K credit records). (iii) Comprehensive evaluation metrics: We assess model performance (Accuracy, Precision, Recall, F1-Score), efficiency (communication cost, training time), strength (model inversion attack success rate), and explainability fidelity (Spearman correlation between federated and centralized SHAP values). Key quantitative results demonstrate FL-PRCR's effectiveness: (1) Risk Prediction Performance: On the LendingClub dataset, FL-PRCR achieves an average F1-score of 0.756, outperforming FedAvg (0.724), FedProx (0.732), and SCAFFOLD (0.738). In the real-world pilot, it improves overdue loan identification accuracy by 18.7% compared to isolated single-institution models and reduces AML false negative rates by 12.3%. (2) Communication Efficiency: FL-PRCR reduces cumulative communication volume by 35.2% (1,850 MB vs. 2,950-3,400 MB) over 100 training rounds compared to baselines, while maintaining comparable convergence rates. (3) Protection: Under simulated model inversion attacks, FL-PRCR's hybrid HE+DP defense reduces the attack success rate to 0.8%, significantly lower than FedAvg with DP (4.7%) and without DP (32.5%). (4) Explainability: The Federated SHAP module produces explanations with high fidelity (Spearman ρ = 0.89) compared to centralized SHAP, generating actionable feature contribution reports that satisfy basic regulatory disclosure requirements. In conclusion, FL-PRCR provides a technically rigorous, empirically validated, and regulatorily adaptable framework that effectively balances the competing demands of collaborative risk control, data, operational efficiency, and regulatory compliance. It represents a significant step toward practical, large-scale deployment of preserving collaborative AI in the financial sector, with potential applicability to other regulated industries facing similar data silo challenges.
Keywords
Federated Learning; Financial Risk Control; Data; Preserving Computation; Cross-Institutional Collaboration.
References
[1] General Data Protection Regulation (GDPR). Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC. Official Journal of the European Union, L 119, 4.5.2016, pp. 1–88.
[2] Personal Information Protection Law of the People’s Republic of China (PIPL). Adopted at the 30th Meeting of the Standing Committee of the Thirteenth National People’s Congress on August 20, 2021. Effective November 1, 2021.
[3] Yao, A.C. Protocols for secure computations. In Proceedings of the 23rd Annual Symposium on Foundations of Computer Science (SFCS '82). IEEE, 1982, pp. 160–164.
[4] McMahan, B., Moore, E., Ramage, D., Hampson, S., and y Arcas, B.A. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS 2017), PMLR 54, 2017, pp. 1273–1282.
[5] Gentry, C. A fully homomorphic encryption scheme. PhD Thesis, Stanford University, 2009.
[6] Dwork, C., McSherry, F., Nissim, K., and Smith, A. Calibrating noise to sensitivity in private data analysis. In Proceedings of the Third Conference on Theory of Cryptography (TCC 2006), Springer, 2006, pp. 265–284.
[7] European Banking Authority (EBA). Final Report on Big Data and Advanced Analytics. EBA/REP/2020/01, January 2020.
[8] Fredrikson, M., Jha, S., and Ristenpart, T. Model inversion attacks that exploit confidence information and basic countermeasures. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (CCS '15), ACM, 2015, pp. 1322–1333.
[9] Karimireddy, S.P., Kale, S., Mohri, M., Reddi, S.J., Stich, S.U., and Suresh, A.T. SCAFFOLD: Stochastic controlled averaging for federated learning. In Proceedings of the 37th International Conference on Machine Learning (ICML 2020), PMLR 119, 2020, pp. 5132–5143.
[10] Li, T., Sahu, A.K., Zaheer, M., Sanjabi, M., Talwalkar, A., and Smith, V. Federated optimization in heterogeneous networks. Proceedings of Machine Learning and Systems (MLSys), Vol. 2, 2020, pp. 429–450.
[11] Abadi, M., Chu, A., Goodfellow, I., McMahan, H.B., Mironov, I., Talwar, K., and Zhang, L. Deep learning with differential . In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS '16), ACM, 2016, pp. 308–318.
[12] Geyer, R.C., Klein, T., and Nabi, M. Differentially private federated learning: A client level perspective. arXiv preprint arXiv:1712.07557, 2017.
[13] Bonawitz, K., Ivanov, V., Kreuter, B., Marcedone, A., McMahan, H.B., Patel, S., Ramage, D., Segal, A., and Seth, K. Practical secure aggregation for -preserving machine learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS '17), ACM, 2017, pp. 1175–1191.
[14] Yang, Q., Liu, Y., Chen, T., and Tong, Y. Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST), 10, 2, Article 12, 2019, pp. 1–19.
[15] Zhang, C., Xie, Y., Bai, H., Yu, B., Li, W., and Gao, Y. A survey on federated learning. Knowledge-Based Systems, 216, 2021, 106775.
[16] Liu, Y., Kang, Y., Xing, C., Chen, T., and Yang, Q. A secure federated transfer learning framework. IEEE Intelligent Systems, 35, 4, 2020, pp. 70–82.
[17] Lundberg, S.M., and Lee, S.I. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems 30 (NeurIPS 2017), 2017, pp. 4765–4774.
[18] Lyu, L., Yu, H., and Yang, Q. Threats to federated learning: A survey. arXiv preprint arXiv:2003.02133, 2020.