MeshMosaic

Scaling Artist Mesh Generation via Local-to-Global Assembly

Rui Xu¹, Tianyang Xue¹, Qiujie Dong¹, Le Wan², Zhe Zhu², Peng Li³, Zhiyang Dou¹,
Cheng Lin⁴, Shiqing Xin⁵, Yuan Liu^3†, Wenping Wang⁶, Taku Komura^1†

¹The University of Hong Kong, ²Tencent Visvise, ³Hong Kong University of Science and Technology,
⁴Macau University of Science and Technology, ⁵Shandong University, ⁶Texas A&M University

(† Corresponding authors.)

Arxiv Preprint.

Abstract

Scaling artist-designed meshes to high triangle numbers remains challenging for autoregressive generative models. Existing transformer-based methods suffer from long-sequence bottlenecks and limited quantization resolution, primarily due to the large number of tokens required and constrained quantization granularity. These issues prevent faithful reproduction of fine geometric details and structured density patterns. We introduce MeshMosaic, a novel local-to-global framework for artist mesh generation that scales to over 100K triangles—substantially surpassing prior methods, which typically handle only around 8K faces. MeshMosaic first segments shapes into patches, generating each patch autoregressively and leveraging shared boundary conditions to promote coherence, symmetry, and seamless connectivity between neighboring regions. This strategy enhances scalability to high-resolution meshes by quantizing patches individually, resulting in more symmetrical and organized mesh density and structure. Extensive experiments across multiple public datasets demonstrate that MeshMosaic significantly outperforms state-of-the-art methods in both geometric fidelity and user preference, supporting superior detail representation and practical mesh generation for real-world applications.

Method

The pipeline of MeshMosaic: Local-to-global assembly with semantic patching and boundary conditioning.

During inference, our method first applies PartField to obtain semantic segmentation of the input shape. The input point cloud is then sampled according to the segmented patches and the original shape. Finally, our approach produces a clean, highly detailed mesh by assembling the generated patches.

Train on Single Patch

The workflow of MeshMosaic for generating a single patch.

Both global and local point cloud features are extracted by a locked Michelangelo encoder. For each patch, the nearest boundary mesh is identified, tokenized, and concatenated before the target mesh token sequence. The GRU network encodes boundary tokens, which are then combined with global and local features and fed into an autoregressive hourglass transformer for mesh generation.

High-resolution Face Numbers

Visual comparison of MeshMosaic with state-of-the-art methods.

We present a more compelling example. For a complex fighter jet model, our method successfully reconstructs intricate details using nearly 30K triangles whereas other approaches struggle with such highly complex shapes, typically yielding only a few hundred to a few thousand triangles. This noticeable gap demonstrates the superior detail recovery and scalability of our method with substantially higher triangle counts.

More Patches

MeshMosaic on hundreds of patches.

Although our training set contains no more than sixty splits per instance, our method can handle inference tasks involving hundreds of patches during testing. This highlights the strong generalization capability of our method.

BibTeX

We will release an official BibTeX entry.

@article{xu2025meshmosaic, title={MeshMosaic: Scaling Artist Mesh Generation via Local-to-Global Assembly}, author={Xu, Rui and Xue, Tianyang and Dong, Qiujie and Wan, Le and Zhu, Zhe and Li, Peng and Dou, Zhiyang and Lin, Cheng and Xin, Shiqing and Liu, Yuan and others}, journal={arXiv preprint arXiv:2509.19995}, year={2025} }