Learned-Indexes for Improving Query Performance A Comprehensive Survey with Taxonomy
DOI:
https://doi.org/10.25098/8.1.30Keywords:
Index Terms, Learned-Indexes, Database Indexing, Query Performance, Complexity AnalysisAbstract
Database performance optimization involves intertwining developmental efforts with challenges. Core to this field are index structures, notably the B+-Tree technique, which enhances database performance by mapping keys to their locations regardless of data distribution. Although the B+-Tree improves query performance, it has inherent limitations affecting overall efficiency. The rise in data volume intensifies indexing complexities. Machine Learning (ML) emerges as a potent approach to rejuvenate legacy Database Management System (DBMS) components. A notable innovation is the "Learning Indexes" paradigm, viewing indexes as predictive models anticipating key locations in datasets, akin to Cumulative Distribution Functions (CDF). This study serves as a survey, exploring technologies underpinning learned-index paradigms and comparing them with traditional database indexing techniques. Through meticulous analysis, it unravels intricacies of both traditional and learned indexing paradigms, equipping aspiring analysts with a panoramic understanding. This underscores the imperative of charting a path for future advancements within this transformative domain.
References
A. Silberschatz et al., Database System Concepts, SEVENTH ed. McGraw-Hill Education, 2 Penn Plaza, New York, NY 10121, 2020.
G. K. Gupta, DATABASE MANAGEMENT SYSTEMS. McGraw Hill Education (India) Private Limited, 2018.
R. Elmasri and S. B. Navathe, Fundamentals of Database Systems, Seventh ed. Pearson, 2016.
C. Wijesiriwardana and M. F. Firdhous, "An Innovative Query Tuning Scheme for Large Databases," in 2019 International Conference on Data Science and Engineering (ICDSE), 2019, pp. 154-159: IEEE.
A. aminuddin et al., "ANALYZING THE EFFECT OF DATA SIZE VARIATION ON THE PERFORMANCE OF B-TREE AND HASH MAP INDEXING IN MYSQL AND POSTGRESQL PLATFORMS," Journal of Critical Reviews JCR, vol. 7, no. 12, 2020.
T. Kraska et al., "The Case for Learned Index Structures," presented at the Proceedings of the 2018 International Conference on Management of Data, Houston, TX, USA, 2018. Available: https://doi.org/10.1145/3183713.3196909
M. Valavala and W. Alhamdani, "A Survey on Database Index Tuning and Defragmentation," International Journal of Engineering Research & Technology, vol. 9, no. 12, pp. 317--321, 2020.
P. Neuhaus et al., "GADIS: A genetic algorithm for database index selection," in The 31st International Conference on Software Engineering & Knowledge Engineering, Portugal, 2019.
J. Zhang and Y. Gao, "CARMI: A Cache-Aware Learned Index with a Cost-based Construction Algorithm," Proceedings of the VLDB Endowment, vol. 15, no. 11, pp. 2679-2691, 2022.
R. Marcus et al., "Benchmarking learned indexes," Proceedings of the VLDB Endowment, vol. 14, no. 1, pp. 1-13, 2020.
P. Ferragina et al., "Why Are Learned Indexes So Effective?," in International Conference on Machine Learning, 2020, vol. 119, pp. 3123-3132: PMLR.
M. Mitzenmacher, "A Model for Learned Bloom Filters and Related Structures," arXiv preprint arXiv:1802.00884, 2018.
M. Mitzenmacher, "A Model for Learned Bloom Filters and Optimizing by Sandwiching," Advances in Neural Information Processing Systems, vol. 31, 2018.
C. Tang et al., "Learned indexes for dynamic workloads," arXiv preprint arXiv:1902.00655, 2019.
A. Galakatos et al., "Fiting-Tree: A data-aware index structure," in Proceedings of the 2019 International Conference on Management of Data, 2019, pp. 1189-1206.
J. Ding et al., "ALEX: an updatable adaptive learned index," in Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, 2020, pp. 969-984.
P. Ferragina and G. Vinciguerra, "The PGM-index: a fully-dynamic compressed learned index with provable worst-case bounds," Proceedings of the VLDB Endowment, vol. 13, no. 8, pp. 1162-1175, 2020.
A. Kipf et al., "RadixSpline: a single-pass learned index," in Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, Portland, Oregon, 2020, pp. 1-5: Association for Computing Machinery.
Y. Wang et al., "Treator: a Fast Centralized Cluster Scheduling at Scale Based on B+ Tree and BSP," in 2021 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom), New York City, NY, USA, 2021, pp. 324-335.
A. Hadian and T. Heinis, "Interpolation-friendly B-trees: Bridging the Gap Between Algorithmic and Learned Indexes," in Proceedings of the 22nd Internationa lConferenceon Extending Database Technology (EDBT), 2019, pp. 710-713: OpenProceedings.org.
S. Mukherjee, "Indexes in Microsoft SQL Server," March 6, 2019. Available: at SSRN: https://ssrn.com/abstract=3415957 or http://dx.doi.org/10.2139/ssrn.3415957
V. Nasteski, "An overview of the supervised machine learning methods," Horizons, vol. 4, pp. 51-62, 2017.
R. Muhamedyev, "Machine learning methods: An overview," Computer Modelling & New Technologies, vol. 19, no. 6, pp. 14-29, 2015.
J. Sánchez, Probability for Data Scientists, 1st ed. San Diego: Cognella Academic Publishing, 2020, p. 362.
A. Spanos, Probability theory and statistical inference: Empirical modeling with observational data, Second ed. Cambridge, United Kingdom: Cambridge University Press, 2019.
M. Agenis-Nevers et al., "An empirical estimation for time and memory algorithm complexities: newly developed R package," Multimedia Tools and Applications, vol. 80, no. 2, pp. 2997-3015, 2021/01/01 2021.
S. Phalke et al., "Big-O Time Complexity Analysis Of Algorithm," presented at the 2022 International Conference on Signal and Information Processing (IConSIP), Pune, India, 2022.
F. Dedov, The Bible of Algorithms and Data Structures: A Complex Subject Simply Explained. 2020.
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
SJCUS's open access articles are published under a Creative Commons Attribution CC-BY-NC-ND 4.0 license.