Scaling artist-designed meshes to high triangle numbers remains challenging for autoregressive generative models. Existing transformer-based methods suffer from long-sequence bottlenecks and limited quantization resolution, primarily due to the large number of tokens required and constrained quantization granularity. These issues prevent faithful reproduction of fine geometric details and structured density patterns. We introduce MeshMosaic, a novel local-to-global framework for artist mesh generation that scales to over 100K triangles—substantially surpassing prior methods, which typically handle only around 8K faces. MeshMosaic first segments shapes into patches, generating each patch autoregressively and leveraging shared boundary conditions to promote coherence, symmetry, and seamless connectivity between neighboring regions. This strategy enhances scalability to high-resolution meshes by quantizing patches individually, resulting in more symmetrical and organized mesh density and structure. Extensive experiments across multiple public datasets demonstrate that MeshMosaic significantly outperforms state-of-the-art methods in both geometric fidelity and user preference, supporting superior detail representation and practical mesh generation for real-world applications.
During inference, our method first applies PartField to obtain semantic segmentation of the input shape. The input point cloud is then sampled according to the segmented patches and the original shape. Finally, our approach produces a clean, highly detailed mesh by assembling the generated patches.
Both global and local point cloud features are extracted by a locked Michelangelo encoder. For each patch, the nearest boundary mesh is identified, tokenized, and concatenated before the target mesh token sequence. The GRU network encodes boundary tokens, which are then combined with global and local features and fed into an autoregressive hourglass transformer for mesh generation.
We present a more compelling example. For a complex fighter jet model, our method successfully reconstructs intricate details using nearly 30K triangles whereas other approaches struggle with such highly complex shapes, typically yielding only a few hundred to a few thousand triangles. This noticeable gap demonstrates the superior detail recovery and scalability of our method with substantially higher triangle counts.
Although our training set contains no more than sixty splits per instance, our method can handle inference tasks involving hundreds of patches during testing. This highlights the strong generalization capability of our method.
We will release an official BibTeX entry.