Apple launches Core AI framework for on-device LLM deployment

The successor to Core ML enables developers to run models up to 70B parameters entirely on Apple Silicon without cloud dependencies.

Omega Editorial· June 20, 2026· 3 min read

Key takeaways

Core AI replaces Core ML as Apple's primary framework for deploying large language models and generative AI entirely on Apple Silicon devices.
The framework supports models up to 70 billion parameters running locally on iPhone, iPad, Mac, and Vision Pro without cloud dependencies or per-token costs.
Developers can convert PyTorch models using Core AI PyTorch, with built-in optimization techniques including quantization and palettization for efficient on-device performance.
Apple now offers three ML frameworks: Core ML for traditional machine learning, Core AI for neural networks and transformers, and MLX Swift for custom model weights.
Automatic model specialization and caching optimize performance after first load, with developer controls for managing the specialization process across app groups.

Apple unveils Core AI at WWDC 26

Apple introduced Core AI, the official successor to its Core ML framework, at its 2026 Worldwide Developers Conference. The new framework is purpose-built for running large language models and generative AI workloads entirely on-device across Apple's hardware ecosystem, according to details first reported by InfoQ.

Core AI represents the technology foundation beneath Apple Intelligence and is now available to third-party developers building what Apple terms "custom intelligence" applications. The framework supports models ranging from compact 3-billion-parameter vision models to large-scale LLMs, including reasoning models with up to 70 billion parameters, running on iPhone, iPad, Mac, and Apple Vision Pro devices.

The framework requires Apple Silicon and operates without server dependencies, eliminating per-token cloud costs while maintaining user data privacy through entirely local processing.

Technical architecture and capabilities

Core AI provides unified hardware access through a single API that distributes workloads seamlessly across the CPU, GPU, and Neural Engine. The framework features a memory-safe Swift API that enables zero-copy data paths and granular control over inference memory allocation.

Ahead-of-time compilation shifts computational work off user devices, delivering near-instant model load times during actual use. When a model first loads, the framework automatically specializes it for the specific hardware and OS version, caching the optimized version for subsequent runs. Developers can manage this process through SpecializationOptions and the AICacheModel API, including the ability to share model caches across app groups.

Model conversion and optimization

Developers can convert PyTorch models to Core AI format using the Core AI PyTorch toolset. The simplest path involves exporting a PyTorch model as a torch.export.ExportedProgram and converting it to a CoreAI AIProgram through TorchConverter.

For more advanced use cases, developers can author Core AI models using built-in composite operations including attention mechanisms, RoPE embeddings, RMSNorm, and gather-matmul. The framework also supports custom lowering functions to map PyTorch operations to Core AI intermediate representation, or even custom Metal kernels for low-level optimization.

Model compression is a critical deployment step, applying quantization and palettization techniques aligned with Core AI runtime execution patterns. These optimizations reduce memory footprint, inference latency, and power consumption simultaneously.

Framework positioning in Apple's ML ecosystem

With Core AI's introduction, Apple now supports three distinct approaches for machine learning on its platforms: Core ML, Core AI, and MLX Swift. Based on developer discussions, Apple appears to position Core ML for traditional non-neural machine learning tasks such as decision trees and tabular feature engineering, Core AI for neural networks and transformers, and MLX for working with custom model weights—though potentially with lower performance characteristics.

Why it matters

Core AI addresses a critical gap for enterprise developers seeking to deploy sophisticated AI capabilities without cloud infrastructure costs or data privacy concerns. By enabling 70-billion-parameter models to run entirely on-device, Apple is positioning its hardware ecosystem as a viable platform for privacy-sensitive AI applications in healthcare, finance, and other regulated industries. The framework's success will depend on community adoption and the growth of pre-optimized models, but it represents Apple's most significant commitment yet to on-device generative AI for third-party developers.

These details were first reported by InfoQ.

#apple#core-ai#on-device-ai#large-language-models#apple-silicon#pytorch

This is an original analysis by the Omega editorial team. Source reporting: AI Watch.

Want systems like this working for your business?

Book a Call

Apple launches Core AI framework for on-device LLM deployment

Apple unveils Core AI at WWDC 26

Technical architecture and capabilities

Model conversion and optimization

Framework positioning in Apple's ML ecosystem

Why it matters

More in AI

Nobel Laureate John Jumper Leaves Google DeepMind for Anthropic

Workers Who Use AI Monthly Face One-Third the Layoff Risk

Nvidia Launches XR AI Framework for AR Glasses and Devices