Secure Semantic Retrieval over Encrypted Cloud Data: A Dual-Index System with Lightweight Fuzzy Encryption and BERT-Based Semantic Indexing

Authors

  • Haider Kareem Mohammed Department of Computer Science, Madurai Kamaraj University, Madurai, Tamilnadu, India.
  • Dr. B.Indrani Department of Computer Science, Director of Distance Education, Madurai Kamaraj University, Madurai, Tamilnadu, India

DOI:

https://doi.org/10.24237/djes.2026.19112

Keywords:

Secure Document Retrieval, Dual-Index Architecture, BERT-Based Semantic Indexing, LFEHI, Encrypted Search, Query Expansion, Scalable Search Systems

Abstract

Cloud storage is widely used for store a personalized and Sensitive information. Cloud usually stores the files in encrypted mode. To search a user specific file among the encrypted files is difficult. Traditional keyword search may be fails when users phrase differently. Typos and vocabulary differences further reduce accuracy. Existing encrypted search solutions typically trade security for search quality. To address this gap, this work introduces a dual-index encrypted search system. This system combines exact keyword matching with semantic understanding. A keyword index, protected with LFEHI model. It supports fuzzy matching for small typing error. A semantic index built with BERT captures meaning and organizes embeddings in a kd-tree for fast retrieval. During a search, both indexes run in parallel. The encrypted keywords match through the hash index, and a semantic embedding check for related concepts. A 70/30 keyword-to-semantic weight is used to integrate their results. To ensuring that exact matches continue to be given priority. Tests on 1,000 documents and 300 queries show clear improvements. The system reaches 90.2% recall—22% higher than keyword-only search and it handles single-character typos with 94.6% accuracy. Average query time stays below 64 ms, supporting real-time use, and the storage cost is only 18.5%. There is no evidence s of content leakage or query leakage following a security evaluation. Due to the dual-index design, encrypted search can deliver meaningful understanding without compromising speed and security.

Downloads

Download data is not yet available.

References

[1] C. Xu, R. Wang, L. Zhu, C. Zhang, R. Lu, and K. Sharif, "Efficient strong privacy-preserving conjunctive keyword search over encrypted cloud data," IEEE Transactions on Big Data, vol. 9, no. 3, pp. 805-817, 2023. https://doi.org/10.48550/arXiv.2203.13662.

[2] Y. Liang, Y. Li, K. Zhang, and Z. Wu, "VMSE: Verifiable multi-keyword searchable encryption in multi-user setting supporting keywords updating," Journal of Information Security and Applications, vol. 76, p. 103518, 2023. https://doi.org/10.1016/j.jisa.2023.103518.

[3] Y. F. Tseng, C. I. Fan, and Z. C. Liu, "Fast keyword search over encrypted data with short ciphertext in clouds," Journal of Information Security and Applications, vol. 70, p. 103320, 2022. https://doi.org/10.1016/j.jisa.2022.103320.

[4] H. Yin, Y. Li, H. Deng, W. Zhang, Z. Qin, and K. Li, "Practical and dynamic attribute-based keyword search supporting numeric comparisons over encrypted cloud data," IEEE Transactions on Services Computing, vol. 16, no. 4, pp. 2855-2867, 2023. doi: 10.1109/TSC.2022.3225112.

[5] K. He, J. Guo, J. Weng, J. K. Liu, and X. Yi, "Attribute-based hybrid boolean keyword search over outsourced encrypted data," IEEE Transactions on Dependable and Secure Computing, vol. 17, no. 6, pp. 1207-1217, 2020. doi: 10.1109/TDSC.2018.2864186.

[6] N. H. Sultan, N. Kaaniche, M. Laurent, and F. A. Barbhuiya, "Authorized keyword search over outsourced encrypted data in cloud environment," IEEE Transactions on Cloud Computing, vol. 10, no. 1, pp. 216-233, 2022. DOI: 10.48175/IJARSCT-18468.

[7] J. Chen, X. Zhang, Y. Li, R. Yang, Y. Liu, and B. Xu, "EliMFS: Achieving efficient leakage-resilient and multi-keyword fuzzy search on encrypted cloud data," IEEE Transactions on Services Computing, vol. 13, no. 6, pp. 1072-1085, 2020. https://doi.org/10.1109/TSC.2017.2765323.

[8] X. Li, Y. Liu, J. Zhang, and M. Chen, "VRFMS: Verifiable ranked fuzzy multi-keyword search over encrypted data," IEEE Transactions on Services Computing, vol. 16, no. 1, pp. 698-710, 2023. doi:10.1109/TSC.2021.3140092.

[9] Q. Tong, Y. Miao, J. Weng, X. Liu, K. K. R. Choo, and R. H. Deng, "Verifiable fuzzy multi-keyword search over encrypted data with adaptive security," IEEE Transactions on Knowledge and Data Engineering, 2023. doi: 10.1109/TKDE.2022.3152033.

[10] H. Shen, L. Xue, H. Wang, L. Zhang, and J. Zhang, "B+-tree based multi-keyword ranked similarity search scheme over encrypted cloud data," IEEE Access, vol. 9, pp. 150865-150877, 2021. doi: 10.1109/ACCESS.2021.3125729.

[11] B. He and T. Feng, "Encryption scheme of verifiable search based on blockchain in cloud environment," Cryptography, vol. 7, no. 2, p. 16, 2023. https://doi.org/10.3390/cryptography7020016.

[12] J. Du, J. Zhou, Y. Lin, W. Zhang, and J. Wei, "Secure and verifiable keyword search in multiple clouds," IEEE Systems Journal, vol. 16, no. 2, pp. 2660-2671, 2022. doi: 10.1109/JSYST.2021.3069200.

[13] H. Dai, M. Yang, G. Yang, Y. Xiang, Z. Hu, and H. Wang, "A keyword-grouping inverted index based multi-keyword ranked search scheme over encrypted cloud data," IEEE Transactions on Sustainable Computing, vol. 7, no. 3, pp. 561-578, 2022. doi: 10.1109/TSUSC.2021.3125520.

[14] S. A. Jabber, S. Hashem, and S. Jafer, "Secure cloud computing by a dual-layer encryption mechanism," Preprints, 2023. https://doi.org/10.20944/preprints202312.0615.v1.

[15] D. Shivaramakrishna and M. Nagaratna, "A novel hybrid cryptographic framework for secure data storage in cloud computing: Integrating AES-OTP and RSA with adaptive key management and time-limited access control," Alexandria Engineering Journal, vol. 84, pp. 275-284, 2023. https://doi.org/10.1016/j.aej.2023.10.054.

[16] H. Tariq and P. Agarwal, "Secure keyword search using dual encryption in cloud computing," International Journal of Information Technology, vol. 12, no. 4, pp. 1063-1072, 2018. https://doi.org/10.1007/s41870-018-0091-6.

[17] H. Yi, "Improving cloud storage and privacy security for digital twin based medical records," Journal of Cloud Computing: Advances, Systems and Applications, vol. 12, no. 1, 2023. https://doi.org/10.1186/s13677-023-00523-6.

[18] Q. Zhang, G. Wang, and Q. Liu, "Enabling cooperative privacy-preserving personalized search in cloud environments," Information Sciences, vol. 480, pp. 1-13, 2019. https://doi.org/10.1016/j.ins.2018.12.016.

[19] N. S. J. Sambrekar and K. P. S., "Privacy-preserving and ranked search using advanced multi-keyword scheme over the encrypted cloud environment," Journal of Electrical Systems, vol. 20, no. 1s, pp. 353-365, 2024. DOI: https://doi.org/10.52783/jes.776.

[20] S. Memon, A. Lakhan, and Q. U. A. Mastoi, "AQ-ResCon: Adaptive quantum-resistant lattice-based key agreement protocol for secure distributed container orchestration in edge cloud environments," International Journal of Mathematics, Statistics, and Computer Science, vol. 3, pp. 377-389, 2025. DOI: https://doi.org/10.59543/ijmscs.v3i.15091.

[21] A. Lakhan, M. Mohammed, L. Al-Budair, S. Memon, V. Slaný, M. Deveci, and R. Martinek, "Enhancing transparency and efficiency in blockchain harvest: Empowering farmers and consumers through transparent trading in agricultural applications," Alexandria Engineering Journal, vol. 118, 2025. https://doi.org/10.1016/j.aej.2025.01.005.

[22] M. S. Charikar, "Similarity estimation techniques from rounding algorithms," in Proceedings of the 34th Annual ACM Symposium on Theory of Computing, pp. 380-388, 2002. https://doi.org/10.1145/509907.509965.

[23] D. J. Bernstein, "ChaCha, a variant of Salsa20," in Workshop Record of SASC 2008: The State of the Art of Stream Ciphers, pp. 1-6, 2008. https://cr.yp.to/chacha/chacha-20080120.pdf.

[24] M. Bellare, R. Canetti, and H. Krawczyk, "Keying hash functions for message authentication," in Advances in Cryptology-CRYPTO '96, Lecture Notes in Computer Science, vol. 1109, pp. 1-15, 1996. https://dl.acm.org/doi/10.5555/646761.706031.

[25] S. E. Robertson and H. Zaragoza, "The probabilistic relevance framework: BM25 and beyond," Foundations and Trends in Information Retrieval, vol. 3, no. 4, pp. 333-389, 2009. https://doi.org/10.1561/1500000019.

[26] C. D. Manning, P. Raghavan, and H. Schütze, “Introduction to Information Retrieval”. Cambridge, U.K.: Cambridge University Press, 2008. https://nlp.stanford.edu/IR-book/pdf/irbookonlinereading.pdf.

Downloads

Published

2026-03-15

How to Cite

[1]
“Secure Semantic Retrieval over Encrypted Cloud Data: A Dual-Index System with Lightweight Fuzzy Encryption and BERT-Based Semantic Indexing”, DJES, vol. 19, no. 1, pp. 160–180, Mar. 2026, doi: 10.24237/djes.2026.19112.

Similar Articles

81-90 of 312

You may also start an advanced similarity search for this article.