Implementation details
The diffusion model was trained on a computer with a Windows 10 operating system. The computer had 64GB of RAM and used an NVIDIA 3090 graphics card with 24 GB of memory. The training software used was PyTorch, with each image undergoing 100 iterations. The image preprocessing method involved the computer resizing the input images proportionally to a maximum resolution of 512 pixels on the longer side. Data augmentation was performed using horizontal flipping. The model’s learning rate was set to 0.000001, and the batch size was set to 24. Xformers and FP16 were utilized for accelerated computations. The total training time for fine-tuning the diffusion model was 20 hours.
ADSSFID-49 dataset
This research aimed to generate a large quantity of aesthetically pleasing and specified decoration-style interior designs using text. Due to the lack of interior datasets with aesthetic scores, this study created the aesthetic decoration style and spatial function interior dataset (ADSSFID-49). Expert interior designers curated this dataset from reputable websites such as “3d6646,” “om47,” and “znzmo48.” Initially, the designers procured over 40,000 free, high-quality images from these sources. Subsequently, they meticulously evaluated each image, excluding those displaying incongruent decoration styles or unclear details. A stringent selection process was followed, and more than 20,000 images aligned with the established criteria. Furthermore, designers manually annotated the decoration styles and spatial functionalities depicted in these images. Ultimately, employing an open-source aesthetic evaluation mode49, aesthetic scores were assigned to each image, culminating in the formation of ADSSFID-49.
We employed a state-of-the-art aesthetic scoring model49 for the automated aesthetic annotation of interior design images. This model, proposed in 2023, was trained on a dataset of 137,000 images with aesthetic scores. The authors of this method indicate that their proposed model outperforms other mainstream models in terms of aesthetic score prediction49. We utilized this model to automatically annotate the aesthetic scores of each image in the ADSSFID-49 dataset. To make the diffusion model easier to train, we normalized all scores using a mapping that adheres to a normal distribution, resulting in integer scores between 1 and 10. The distribution of aesthetic scores for the processed ADSSFID-49 images can be seen in Fig. 5.
We enlisted the expertise of professional designers to manually annotate the decoration styles and spatial functions of the ADSSFID-49 dataset. The decoration style annotations encompass seven categories: “Contemporary style”, “Chinese style”, “Nordic style”, “Japanese style”, “European style”, “Industrial style”, and “American style”. The spatial function annotations also consist of seven categories: “Children’s room”, “Study room”, “Bedroom”, “Bathroom”, “Living room”, “Dining room”, and “Kitchen”. The distribution of the different categories of images is shown in Table 1.
From Table 1, we can observe that in the ADSSFID-49 dataset, when sorted by a decoration style, the “Contemporary style” has the highest number of images (5153 images), while the “Japanese style” has the fewest (2108 images). When sorted by spatial function, the “Living room” category has the highest number of images (5161 images), while the “Kitchen” category has the fewest (1490 images). In total, there are 22,403 images in the dataset. Figure 6 shows some training data samples.
Evaluation metrics
The evaluation of interior design involves subjective and objective assessments. Typically, conventional objective evaluation methods employ computerized techniques to assess image clarity and compositional coherence. However, considering that our focus in interior design evaluation is not solely on image clarity or compositional coherence but on the aesthetic appeal of the generated interior designs, the consistency of decoration styles, and the rationality of spatial functions, these aspects require subjective evaluations by professional designers. Therefore, we did not employ conventional objective evaluation methods50,51.
Assessing generative architectural design images poses a significant challenge. Traditional automated image evaluation methods fail to evaluate the design content effectively50,51 . Consequently, this study invited experienced industry designers to collaboratively discuss and formulate a series of evaluation metrics tailored for professional interior design. These metrics encompass eight categories: “aesthetically pleasing,” “decoration style,” “spatial function,” “design details,” “object integrity,” “object placement,” “realistic,” and “usability.”
Subsequently, the content and significance of the evaluation metrics designed herein are elucidated. We identify “aesthetically pleasing” and “usability” as the pivotal indicators. Specifically, “aesthetically pleasing” signifies that the generated design possesses aesthetic appeal, a crucial interior design aspect. The “usability” metric indicates that upon a comprehensive observation of the generated image, no apparent errors are observed, thus validating the image’s usability. For the other indicators, “decoration style” refers to the consistency between the generated interior design’s decorative style and the provided cues. “Spatial function” pertains to the appropriateness of the generated space size and its alignment with the described spatial functions. “Design details” denote the richness and complexity of design elements in the generated image. “Object integrity” ensures the absence of defects in the generated objects. “Object placement” evaluates the rationality of the generated furniture positioning. Finally, “realistic” indicates that the generated image closely resembles a photograph taken by a camera. These evaluation metrics enable a comprehensive assessment of the design quality and show the practical value of the generated interior design.
Visual assessment
In this research, we visually compared our diffusion model with other popular diffusion models. We selected several mainstream diffusion models for comparison, including Disco Diffusion52, Dall\(\cdot \)E 224, Midjourney25, and Stable Diffusion26. These are the most widely used and influential diffusion models, with active user counts exceeding one million24,25,26. We generated images of five Chinese-style living room designs using these models and performed a visual comparison. The generated images are shown in Fig. 7. By comparing these images, we can evaluate the differences in the effectiveness of different models in generating Chinese-style living rooms. This comparison will help us understand the strengths and areas for improvement of our diffusion model in generating interior designs.
By observing Fig. 7, we have drawn several conclusions regarding the performance of different methods. For Disco Diffusion52, this method failed to generate interior designs. It needed help comprehending the relationships between design elements and their connection to the space. The generated images needed more design details and aesthetic appeal. Midjourney25 demonstrated a better understanding of the relationships between design elements, resulting in images with some aesthetic appeal. However, Midjourney exhibited a bias in understanding decoration styles, leaning more towards ancient rather than modern styles. Additionally, the overall realistic of the images needed to be increased. Dall\(\cdot \)E 224 produced highly realistic images. However, it needed an understanding of spatial function, object integrity, and object placement. These shortcomings affected the overall quality of the generated images. Stable Diffusion26 generated images with accurate spatial function and object integrity. However, it struggled with understanding decoration styles, leading to incorrect positioning of elements and a lack of aesthetic appeal. In summary, none of these methods fully satisfied the requirements for interior design in terms of aesthetic appeal, decoration style, spatial function, design details, object integrity, object placement, realistic, and usability. There is still room for improvement in applying these models in interior design.
Compared to other methods, the diffusion model trained in this research can simultaneously meet common design requirements. Table 2 presents the advantages and disadvantages of all the methods compared. Table 2 shows that the proposed method outperforms all the tested methods, with Midjourney25 ranking second, Stable Diffusion26 ranking third, and Dall\(\cdot \)E 224 ranking fourth. Disco Diffusion52 is unsuitable for generating interior designs.
Quantitative evaluation
We generated 1,960 interior design images using Dall\(\cdot \)E 224, Stable Diffusion26, Midjourney25, and the method proposed in this research (i.e., AIDDM). These images spanned 49 different categories, including seven decoration styles and seven spatial functionalities. Each category consisted of 10 generated images. To evaluate the quality of these images, we enlisted seven professional designers. The evaluation criteria included “aesthetically pleasing”, “decoration style”, “spatial function”, “design details”, “object integrity”, “object placement”, “realistic”, and “usability”. The evaluation process involved the experts judging whether the generated images met each criterion, awarding one point for compliance and zero points otherwise. Finally, we calculated the average score for each criterion by dividing the total score by the total number of images and converting it into a percentage. This allowed us to obtain quantitative scores for each model. The scores for different diffusion models are illustrated in Fig. 8:
From Fig. 8, it is evident that there are significant differences among the four models in generating interior designs. Our method outperforms Midjourney25, Dall\(\cdot \)E 224, and Stable Diffusion26 in all the evaluation criteria. Compared to the model ranking second, our model shows significant advantages in the “Aesthetically pleasing”, “Spatial function”, “Object placement”, “Realistic”, and “Usability” criteria, exceeding them by 8.13%, 11.88%, 31.37%, 6.13%, and 8.25%, respectively. In particular, our model achieves high scores in the “Aesthetically pleasing”, “Decoration style”, and “Spatial function” criteria, demonstrating its ability to generate interior designs that are aesthetically pleasing and align with specified decoration styles and spatial functionalities.
We consider our method and the Midjourney model to be usable in generating interior designs. Midjourney25 achieved a usability score of 70.63%, while our model achieved 78.88%. Our method outperforms Midjourney25 regarding aesthetic appeal, appropriate spatial function, reasonable object placement, realism, and usability. However, Dall\(\cdot \)E 224 and Stable Diffusion26 are considered unusable for interior design generation, with usability scores of only 22.38% and 24.38%, respectively.
Generating design details showcase
Figure 9 showcases a Chinese-style living room generated by our diffusion model. From the image, it is evident that the entire space possesses aesthetic appeal, and the decoration style and spatial function meet the requirements. This shows that our model is capable of generating designs with aesthetic appeal, specified decoration styles, and spatial function. Upon careful examination of the generated design details, we can observe that the furniture is placed in appropriate positions, with suitable dimensions, and the objects have no noticeable flaws. The image also includes numerous decoration items consistent with the design style, such as landscape paintings on the wall, tea sets, and vases on the coffee table, highlighting the model’s capability to generate detailed designs.
The images generated by the model exhibit a sense of realism, with well-handled lighting and shadow relationships. The light shining through the curtains into the room creates a soft and warm ambiance, while the recessed lights leave clear projections on the wall. However, there is still room for improvement in the model. For instance, the generation of lighting fixtures may be partially accurate, resulting in minor excess lines. Additionally, the projections of wall-mounted lights are irregular, as some areas exhibit lighting and shadow relationships without arranged lights. Despite these areas for improvement, overall, the interior designs generated using our diffusion model are usable and can enhance the efficiency of designers in generating design proposals and making design decisions.