The University of Hong Kong (HKU) Tencent Visvise Hong Kong University of Science and Technology (HKUST) Macau University of Science and Technology (MUST) Shandong University (SDU) Texas A&M University (TAMU)
Home Abstract Method Results BibTeX

Scaling Artist Mesh Generation via Local-to-Global Assembly

1The University of Hong Kong, 2Tencent Visvise, 3Hong Kong University of Science and Technology,
4Macau University of Science and Technology, 5Shandong University, 6Texas A&M University
(† Corresponding authors.)
Arxiv Preprint.
MeshMosaic empowers scaling up artist mesh generation to more than 100k triangles by assembling boundary-conditioned local patches into cohesive, high-resolution meshes. It delivers flexible support over mesh density and ensures the faithful retention of intricate design details. Faces are assigned random blue colors to better illustrate the mesh layout.

Abstract

Scaling artist-designed meshes to high triangle numbers remains challenging for autoregressive generative models. Existing transformer-based methods suffer from long-sequence bottlenecks and limited quantization resolution, primarily due to the large number of tokens required and constrained quantization granularity. These issues prevent faithful reproduction of fine geometric details and structured density patterns. We introduce MeshMosaic, a novel local-to-global framework for artist mesh generation that scales to over 100K triangles—substantially surpassing prior methods, which typically handle only around 8K faces. MeshMosaic first segments shapes into patches, generating each patch autoregressively and leveraging shared boundary conditions to promote coherence, symmetry, and seamless connectivity between neighboring regions. This strategy enhances scalability to high-resolution meshes by quantizing patches individually, resulting in more symmetrical and organized mesh density and structure. Extensive experiments across multiple public datasets demonstrate that MeshMosaic significantly outperforms state-of-the-art methods in both geometric fidelity and user preference, supporting superior detail representation and practical mesh generation for real-world applications.

Method

The pipeline of MeshMosaic: Local-to-global assembly with semantic patching and boundary conditioning.

During inference, our method first applies PartField to obtain semantic segmentation of the input shape. The input point cloud is then sampled according to the segmented patches and the original shape. Finally, our approach produces a clean, highly detailed mesh by assembling the generated patches.

Train on Single Patch

The workflow of MeshMosaic for generating a single patch.

Both global and local point cloud features are extracted by a locked Michelangelo encoder. For each patch, the nearest boundary mesh is identified, tokenized, and concatenated before the target mesh token sequence. The GRU network encodes boundary tokens, which are then combined with global and local features and fed into an autoregressive hourglass transformer for mesh generation.

Comparisons

Visual comparison of MeshMosaic with state-of-the-art methods.

High-resolution Face Numbers

Visual comparison of MeshMosaic with state-of-the-art methods.

We present a more compelling example. For a complex fighter jet model, our method successfully reconstructs intricate details using nearly 30K triangles whereas other approaches struggle with such highly complex shapes, typically yielding only a few hundred to a few thousand triangles. This noticeable gap demonstrates the superior detail recovery and scalability of our method with substantially higher triangle counts.

More Patches

MeshMosaic on hundreds of patches.

Although our training set contains no more than sixty splits per instance, our method can handle inference tasks involving hundreds of patches during testing. This highlights the strong generalization capability of our method.

BibTeX

We will release an official BibTeX entry.