OpenAI has just announced and released Point-E, a machine learning system that can generate 3D models based on text prompts. Point-E generates point clouds, which are sets of data points representing a 3D shape and are said to be able to produce 3D models in a few minutes using a single Nvidia V100 GPU, according to OpenAI.
Point-E consists of two models: a text-to-image model, which was trained on labelled images to understand the relationship between words and visual concepts, and an image-to-3D model, which was trained on images paired with 3D objects. When given a text prompt, Point-E’s text-to-image model generates a synthetic object that is fed to the image-to-3D model, which then generates a point cloud.
The Point-E team noted in their paper that the mesh-generating model can sometimes miss certain parts of objects, resulting in blocky or distorted shapes. However, they also stated that Point-E’s performance is orders of magnitude faster than the previous models, making it potentially more practical for certain applications or enabling the discovery of higher-quality 3D objects. The OpenAI researchers suggested that Point-E’s point clouds could be used to fabricate real-world objects through 3D printing and could potentially find use in game and animation development.
OpenAI is not the first company to venture into the realm of 3D object generation. Earlier in 2021, Google released DreamFusion, an expanded version of Dream Fields, a generative 3D system that does not require prior training and can generate 3D representations of objects without 3D data. 3D models have a wide range of applications in fields such as film and TV, interior design, architecture, and various scientific fields. They are used by architectural firms to demonstrate proposed buildings and landscapes, and by engineers as designs for new devices, vehicles, and structures. As model-synthesizing AI continues to develop, it has the potential to disrupt various industries.