Objectives: This study aimed to investigate the performance and reliability of data-driven models employing correlational feature analysis and clinical validation for predicting periodontal disease. Methods: The 7th Korea National Health and Nutrition Examination Survey (n = 10,654) was used for correlation analysis to identify significant risk factors for periodontitis. Periodontal prediction models were developed with the selected factors and database, followed by internal validation with 5-fold cross-validation and 1000 bootstrap resampling. External validation was conducted with clinical data (n = 120) collected through self-reported questionnaires, clinical periodontal parameters, and radiographic image analysis. Predictive performance was assessed for logistics regression, support vector machine, random forest, XGBoost, and neural network algorithms using the area under the receiver operating characteristic curves (AUC) and other performance metrics. Results: Correlation analysis identified 16 features from over 1000 potential risk factors for periodontitis. The best data-driven model (XGBoost) showed AUC values of 0.823 and 0.796 for internal and external validations, respectively. Modeling with clinical data revealed those same measures to be 0.836 and 0.649, respectively. In addition, the data-driven model could predict other clinical periodontal parameters including severe bone loss (AUC = 0.813), gingival bleeding (AUC = 0.694), and tooth loss (AUC = 0.734). A patient case study about prognostic predictions revealed that the probability of periodontitis can be reduced by 6.0 % (stop smoking) and 0.6 % (stop drinking) on average. Conclusions: Data-driven models for predicting periodontitis and other periodontal parameters were developed from 16 risk factors, demonstrating enhanced prediction performance and reproducibility in internal–external validations.