Understanding objects at the level of their constituent parts is fundamental to advancing computer vision, graphics, and robotics. While datasets like PartNet have driven progress in 3D part understanding, their reliance on untextured geometries and expert-dependent annotation limits scalability and usability. We introduce PartNeXt, a next-generation dataset addressing these gaps with over 23,000 high-quality, textured 3D models annotated with fine-grained, hierarchical part labels across 50 categories.
We benchmark PartNeXt on two tasks: (1) class-agnostic part segmentation, where state-of-the-art methods (e.g., PartField, SAMPart3D) struggle with fine-grained and leaf-level parts, and (2) 3D part-centric question answering, a new benchmark for 3D-LLMs that reveals significant gaps in open-vocabulary part grounding. Additionally, training Point-SAM on PartNeXt yields substantial gains over PartNet, underscoring the dataset's superior quality and diversity. By combining scalable annotation, texture-aware labels, and multi-task evaluation, PartNeXt opens new avenues for research in structured 3D understanding.
The example shows a microwave containing an internal tray. The dual-panel layout allows annotators to first label external parts such as the “door” (as shown in the right panel with already segmented meshes), and then proceed to annotate internal components like the “tray” (visible in the unsegmented mesh in the left panel). This design effectively mitigates occlusion issues during annotation.
Our constructed dataset, PartNeXt, provides 350187 annotated instances for 23,519 objects across 50 categories. Specifically, 14,811 instances were sourced from Objaverse, 2,633 from ABO, and 6,075 from 3DFuture.
PartNeXt Dataset Statistic. #S represent number of annotated objects, #P as the number of total annotated parts, P_Med is median number of parts, D_Med is median number of hierarchy depth, D_Max is maximum number of hierarchy depth.
Visualization of PartNet and PartNeXt results Since PartNet uses remeshing to obtain finer-grained parts, the mesh undergoes deformation, lacks texture, and requires manually drawn cutting lines after remeshing to achieve segmentation. As a result, the boundaries of the parts are often not smooth.
Part Segmentation Results on PartNeXt. PartField struggles to separate connected regions, SAMesh excels at fine-grained segmentation but over-segments, while SAMPart3D lacks continuity in weak textures and granularity control.
Representative prompt-response pairs used to evaluate 3D part-level understanding. (a) Part Counting: the model is requested to enumerate the number of legs in a chair. (b) Part Classification: the model must name the part highlighted in red within the point-cloud bed. (c) Part Grounding: the model is asked to localize the “Shelf” of a bookcase by outputting the eight corner coordinates of its bounding box.
Comparison of Point-SAM models trained on different datasets. The metric IoU@k is reported for 3D promptable segmentation, where k denotes the number of prompt points.
Coming soon