CONTEXT: It is not clear how to integrate artificial intelligence (AI)-based models into diagnostic workflows. OBJECTIVE: To develop and validate a deep-learning-based AI model (AI-Thyroid) for thyroid cancer diagnosis, and to explore how this improves diagnostic performance. METHODS: The system was trained using 19 711 images of 6163 patients in a tertiary hospital (Ajou University Medical Center; AUMC). It was validated using 11 185 images of 4820 patients in 24 hospitals (test set 1) and 4490 images of 2367 patients in AUMC (test set 2). The clinical implications were determined by comparing the findings of six physicians with different levels of experience (group 1: 4 trainees, and group 2: 2 faculty radiologists) before and after AI-Thyroid assistance. RESULTS: The area under the receiver operating characteristic (AUROC) curve of AI-Thyroid was 0.939. The AUROC, sensitivity, and specificity were 0.922, 87.0%, and 81.5% for test set 1 and 0.938, 89.9%, and 81.6% for test set 2. The AUROCs of AI-Thyroid did not differ significantly according to the prevalence of malignancies (>15.0% vs ≤15.0%, P = .226). In the simulated scenario, AI-Thyroid assistance changed the AUROC, sensitivity, and specificity from 0.854 to 0.945, from 84.2% to 92.7%, and from 72.9% to 86.6% (all P < .001) in group 1, and from 0.914 to 0.939 (P = .022), from 78.6% to 85.5% (P = .053) and from 91.9% to 92.5% (P = .683) in group 2. The interobserver agreement improved from moderate to substantial in both groups. CONCLUSION: AI-Thyroid can improve diagnostic performance and interobserver agreement in thyroid cancer diagnosis, especially in less-experienced physicians.