abstract = {We introduce a novel framework for using natural language to generate and edit 3D
indoor scenes, harnessing scene semantics and text-scene grounding knowledge learned
from large annotated 3D scene databases. The advantage of natural language editing
interfaces is strongest when performing semantic operations at the sub-scene level,
acting on groups of objects. We learn how to manipulate these sub-scenes by analyzing
existing 3D scenes. We perform edits by first parsing a natural language command from
the user and transforming it into a semantic scene graph that is used to retrieve
corresponding sub-scenes from the databases that match the command. We then augment
this retrieved sub-scene by incorporating other objects that may be implied by the
scene context. Finally, a new 3D scene is synthesized by aligning the augmented sub-scene
with the user's current scene, where new objects are spliced into the environment,
possibly triggering appropriate adjustments to the existing scene arrangement. A suggestive
modeling interface with multiple interpretations of user commands is used to alleviate
ambiguities in natural language. We conduct studies comparing our approach against
both prior text-to-scene work and artist-made scenes and find that our method significantly
outperforms prior work and is comparable to handmade scenes even when complex and
varied natural sentences are used.},
