RepoRepair: Leveraging Code Documentation for Repository-Level Automated Program Repair

👤 作者: Zhongqiang Pan, Chuanyi Li, Wenkang Zhong, Yi Feng, Bin Luo, Vincent Ng

论文速览

The need for this research arises from the limitations of current automated program repair (APR) methods, which struggle to effectively scale from fixing isolated code functions to addressing issues across entire code repositories. Existing approaches often lack the global understanding necessary to identify and resolve complex, cross-file problems, relying instead on shallow retrieval methods or costly iterative processes. This gap highlights the necessity for a more comprehensive solution that can handle the intricacies of repository-level code repair.

RepoRepair is proposed as a solution to this challenge, introducing a documentation-enhanced approach that leverages large language models (LLMs) to generate hierarchical code documentation. This documentation provides structured semantic abstractions, enabling LLMs to better understand the context and dependencies within a repository. The process involves using a text-based LLM to create detailed documentation at the file and function levels, which aids in fault localization. Once potential faults are identified, another powerful LLM attempts to repair the problematic code snippets. The results are promising, with RepoRepair achieving a 45.7% repair rate on the SWE-bench Lite dataset at a cost-effective rate of $0.44 per fix. It also delivers state-of-the-art performance on the SWE-bench Multimodal dataset, with a 37.1% repair rate, demonstrating its robust and cost-effective capabilities across various problem domains.

📖 论文核心内容

1. 主要解决了什么问题？

The core problem addressed by the paper is the challenge of scaling automated program repair (APR) from isolated functions to full repositories. Traditional APR methods are often limited by their inability to understand the global context and dependencies across files within a repository, which is crucial for effectively locating and repairing faults. This limitation results in poor performance when dealing with complex cross-file issues. The motivation behind addressing this problem is to enhance the efficiency and effectiveness of APR systems, which are critical for maintaining and improving large-scale software systems. The research gap lies in the lack of approaches that can leverage comprehensive repository-level context to improve fault localization and repair accuracy.

2. 提出了什么解决方案？

The proposed solution is RepoRepair, a documentation-enhanced approach for repository-level automated program repair. The key innovation of RepoRepair is its use of large language models (LLMs) to generate hierarchical code documentation, which provides structured semantic abstractions of the code repository. This documentation serves as auxiliary knowledge that guides the fault localization process. RepoRepair differs from existing approaches by integrating this documentation with LLMs to comprehend repository-level context and dependencies, enabling more accurate identification and repair of faulty code snippets. This approach addresses the limitations of shallow retrieval methods and costly agent iterations by providing a more holistic understanding of the codebase.

3. 核心方法/步骤/策略

RepoRepair employs a two-step methodology. First, it uses a text-based LLM, such as DeepSeek-V3, to generate file and function-level code documentation for repositories. This documentation acts as auxiliary knowledge for guiding fault localization. The second step involves using a powerful LLM, like Claude-4, to attempt repairs on the identified suspicious code snippets based on the fault localization results and issue descriptions. This approach leverages the structured semantic abstractions created by the documentation to enhance the LLM's understanding of the repository-level context and dependencies, facilitating more effective fault localization and repair.

4. 实验设计

The experiments are designed to evaluate the effectiveness and cost-efficiency of RepoRepair. The paper reports results on two benchmark datasets: SWE-bench Lite and SWE-bench Multimodal. RepoRepair achieves a 45.7% repair rate on SWE-bench Lite at a low cost of $0.44 per fix, showcasing its cost-effective performance. On SWE-bench Multimodal, it delivers state-of-the-art performance with a 37.1% repair rate, albeit at a higher cost of $0.56 per fix. These results demonstrate the robustness of RepoRepair across diverse problem domains, highlighting its ability to effectively repair complex cross-file issues compared to existing methods.

5. 结论

The main findings of the paper indicate that RepoRepair significantly improves the repair rate and cost-efficiency of repository-level automated program repair. By leveraging hierarchical code documentation, RepoRepair enhances the understanding of repository-level context, leading to more accurate fault localization and repair. However, the paper acknowledges limitations such as the higher cost associated with repairs on more complex datasets like SWE-bench Multimodal. Future directions include optimizing the cost-effectiveness of the approach and exploring the integration of additional contextual information to further improve repair accuracy and efficiency.

🤔 用户关心的问题

How does RepoRepair utilize large language models for bug localization, and what role does hierarchical code documentation play in this process? The user's interest in how LLMs are used for bug localization aligns with the paper's focus on leveraging hierarchical code documentation to guide fault localization. Understanding this process will provide insights into the interaction between LLMs and documentation in identifying bugs across a repository.
In what ways does RepoRepair address different types of bugs, such as semantic, syntax, and vulnerability issues, and how does its approach compare to traditional methods? The user is interested in the repair of various bug types, and the paper discusses RepoRepair's ability to handle complex cross-file issues. This question aims to explore the breadth of bug types addressed by RepoRepair and its effectiveness compared to existing APR methods.
What mechanisms does RepoRepair employ to evaluate the correctness of generated patches, and how does it ensure reliability in the repair process? Evaluating patch correctness is a key interest for the user. This question seeks to understand the methodologies RepoRepair uses to validate patches and ensure they effectively resolve identified issues, which is crucial for reliable program repair.
How does RepoRepair integrate static and dynamic analysis techniques to enhance the reliability and accuracy of its automated program repair process? The user is interested in the interaction between LLMs and static/dynamic analysis for improving repair reliability. This question probes how RepoRepair might incorporate such techniques to bolster its repair accuracy and effectiveness.
What are the cost implications of using RepoRepair for automated program repair, and how does its cost-efficiency compare across different benchmark datasets? Understanding the cost-efficiency of RepoRepair is important for evaluating its practical application. This question addresses the economic aspect of using LLMs for APR, which is relevant for assessing the feasibility of large-scale adoption.

💡 逐项解答

How does RepoRepair utilize large language models for bug localization, and what role does hierarchical code documentation play in this process?

信心指数: 0.90

In what ways does RepoRepair address different types of bugs, such as semantic, syntax, and vulnerability issues, and how does its approach compare to traditional methods?

RepoRepair distinguishes itself from traditional automated program repair (APR) methods by addressing a broader spectrum of bug types, including semantic, syntax, and vulnerability issues, through a novel approach that leverages code documentation. Traditional APR methods often struggle with complex cross-file issues due to their reliance on shallow retrieval techniques or costly iterative processes. In contrast, RepoRepair employs large language models (LLMs) to generate hierarchical code documentation, which provides a structured semantic abstraction of the entire code repository. This documentation serves as auxiliary knowledge, enabling the LLMs to better understand repository-level context and dependencies, which is crucial for effective fault localization and repair.

The paper highlights that RepoRepair's approach is particularly effective in handling semantic bugs, which require a deep understanding of the code's logic and intent. By using LLMs like DeepSeek-V3 to create detailed documentation at both the file and function levels, RepoRepair can identify and repair semantic issues more accurately than traditional methods. This is evidenced by its performance on the SWE-bench Lite and Multimodal benchmarks, where it achieved repair rates of 45.7% and 37.1% respectively, demonstrating its robustness across diverse problem domains.

Moreover, RepoRepair's ability to address syntax and vulnerability issues is enhanced by its comprehensive fault localization process, which is guided by the generated documentation and issue descriptions. This allows the system to pinpoint suspicious code snippets more effectively, leading to more precise repairs. The use of a powerful LLM, such as Claude-4, further aids in crafting repairs that align with the intended functionality of the code, thus addressing syntax errors and potential vulnerabilities.

Overall, RepoRepair's documentation-enhanced approach represents a significant advancement over traditional APR methods, offering a more holistic and cost-effective solution to repository-level program repair. Its integration of LLMs for documentation generation and repair tasks not only improves the accuracy of bug fixes but also reduces the cost per fix, making it a compelling alternative in the field of software engineering.

信心指数: 0.90

What mechanisms does RepoRepair employ to evaluate the correctness of generated patches, and how does it ensure reliability in the repair process?

RepoRepair employs a multifaceted approach to evaluate the correctness of generated patches, ensuring reliability in the repair process by leveraging large language models (LLMs) and structured semantic abstractions. The paper describes how RepoRepair first utilizes a text-based LLM, such as DeepSeek-V3, to generate hierarchical code documentation at both the file and function levels. This documentation acts as auxiliary knowledge, guiding the fault localization process by providing a comprehensive understanding of the repository-level context and dependencies. This step is crucial as it allows the system to identify the most relevant parts of the code that need fixing, thereby increasing the likelihood that the generated patches will address the actual issues present in the code.

Following fault localization, RepoRepair uses a more powerful LLM, like Claude-4, to attempt repairs on the identified suspicious code snippets. The use of LLMs here is significant because they can comprehend complex code structures and dependencies, which are often missed by traditional program repair methods that rely on shallow retrieval techniques. By integrating these advanced models, RepoRepair not only enhances the accuracy of fault localization but also improves the quality of the patches generated. The paper highlights that this approach has been evaluated on benchmarks such as SWE-bench Lite and SWE-bench Multimodal, achieving repair rates of 45.7% and 37.1% respectively, which demonstrates its robust performance across diverse problem domains.

Moreover, the cost-effectiveness of RepoRepair is noteworthy, with the paper reporting a low cost of $0.44 per fix on SWE-bench Lite. This indicates that the system is not only reliable in terms of patch correctness but also efficient in terms of resource utilization. By combining the strengths of LLMs with structured semantic abstractions, RepoRepair ensures that the patches it generates are both correct and reliable, addressing the complex challenges of repository-level automated program repair effectively.

信心指数: 0.90

How does RepoRepair integrate static and dynamic analysis techniques to enhance the reliability and accuracy of its automated program repair process?

RepoRepair enhances the reliability and accuracy of its automated program repair process by integrating static and dynamic analysis techniques through the use of large language models (LLMs) and code documentation. The paper describes how RepoRepair leverages LLMs to generate hierarchical code documentation, which serves as structured semantic abstractions that guide the repair process. This documentation provides a comprehensive understanding of repository-level context and dependencies, which is crucial for effective fault localization and repair. By employing a text-based LLM like DeepSeek-V3, RepoRepair generates file/function-level documentation that acts as auxiliary knowledge, enabling the system to locate faults more accurately across complex cross-file issues.

Once the faults are localized, RepoRepair utilizes a powerful LLM, such as Claude-4, to attempt repairs on the identified suspicious code snippets. This approach allows the system to comprehend the broader context of the repository, which is often missed by traditional methods that rely solely on shallow retrieval or costly agent iterations. The integration of static analysis, through documentation generation, and dynamic analysis, via LLM-driven repair attempts, ensures that RepoRepair can address a wide range of issues effectively. The paper highlights RepoRepair's performance on SWE-bench Lite and SWE-bench Multimodal, achieving repair rates of 45.7% and 37.1%, respectively, demonstrating its robust and cost-effective capabilities across diverse problem domains. This evidence underscores the significance of combining static and dynamic analysis techniques with LLMs to enhance the reliability and accuracy of automated program repair processes.

信心指数: 0.90

What are the cost implications of using RepoRepair for automated program repair, and how does its cost-efficiency compare across different benchmark datasets?

RepoRepair presents a compelling case for cost-efficient automated program repair (APR) by leveraging large language models (LLMs) to enhance code documentation and facilitate repository-level fault localization and repair. The paper highlights that RepoRepair achieves a notable repair rate of 45.7% on the SWE-bench Lite dataset at a cost of $0.44 per fix. This indicates a high level of cost-efficiency, particularly when considering the complexity of repository-level repairs compared to isolated function-level repairs. The approach utilizes hierarchical code documentation generated by LLMs, which provides structured semantic abstractions that improve the understanding of repository-level context and dependencies. This method not only enhances the accuracy of fault localization but also reduces the computational overhead typically associated with APR processes.

In comparison, when evaluated on the SWE-bench Multimodal dataset, RepoRepair maintains state-of-the-art performance with a 37.1% repair rate, albeit at a slightly higher cost of $0.56 per fix. This variation in cost across different datasets suggests that while RepoRepair is robust and effective, the complexity and nature of the dataset can influence the cost-efficiency of the repair process. The paper underscores the significance of using powerful LLMs like Claude-4 for repairing identified suspicious code snippets, which contributes to the overall effectiveness of the approach. The ability to deliver high repair rates at relatively low costs across diverse problem domains demonstrates RepoRepair's potential for large-scale adoption in practical applications, where economic considerations are paramount.

Overall, the cost implications of using RepoRepair are favorable, especially when considering the scalability challenges associated with APR. By integrating LLMs for enhanced documentation and repair processes, RepoRepair offers a promising solution that balances performance with cost, making it a viable option for widespread use in software engineering practices.

信心指数: 0.90

📝 综合总结

RepoRepair employs large language models (LLMs) to enhance the process of bug localization by leveraging hierarchical code documentation. The approach begins with the generation of detailed documentation at both the function and file levels using a text-based LLM, such as DeepSeek-V3. This documentation serves as a structured semantic abstraction that aids in understanding the repository-level context and dependencies, which are crucial for effective fault localization. The paper highlights that "current methods, limited by context and reliant on shallow retrieval or costly agent iterations, falter on complex cross-file issues." By contrast, RepoRepair's documentation-enhanced approach provides a more comprehensive view, enabling LLMs to better navigate and identify faults across a repository.

The hierarchical code documentation plays a pivotal role by acting as auxiliary knowledge that guides the fault localization process. This structured information allows the LLMs to comprehend the intricate relationships and dependencies within the codebase, which are often missed by traditional methods. Once the suspicious code snippets are identified, a powerful LLM, such as Claude-4, attempts to repair them based on the fault localization results and the issue description. The paper reports that RepoRepair achieves a 45.7% repair rate on SWE-bench Lite and a 37.1% repair rate on SWE-bench Multimodal, demonstrating its effectiveness and cost-efficiency across diverse problem domains. This performance underscores the significance of integrating hierarchical documentation with LLMs, as it enhances the ability to locate and fix bugs at the repository level, overcoming the limitations of previous approaches.

RepoRepair distinguishes itself from traditional automated program repair (APR) methods by addressing a broader spectrum of bug types, including semantic, syntax, and vulnerability issues, through a novel approach that leverages code documentation. Traditional APR methods often struggle with complex cross-file issues due to their reliance on shallow retrieval techniques or costly iterative processes. In contrast, RepoRepair employs large language models (LLMs) to generate hierarchical code documentation, which provides a structured semantic abstraction of the entire code repository. This documentation serves as auxiliary knowledge, enabling the LLMs to better understand repository-level context and dependencies, which is crucial for effective fault localization and repair.

The paper highlights that RepoRepair's approach is particularly effective in handling semantic bugs, which require a deep understanding of the code's logic and intent. By using LLMs like DeepSeek-V3 to create detailed documentation at both the file and function levels, RepoRepair can identify and repair semantic issues more accurately than traditional methods. This is evidenced by its performance on the SWE-bench Lite and Multimodal benchmarks, where it achieved repair rates of 45.7% and 37.1% respectively, demonstrating its robustness across diverse problem domains.

Moreover, RepoRepair's ability to address syntax and vulnerability issues is enhanced by its comprehensive fault localization process, which is guided by the generated documentation and issue descriptions. This allows the system to pinpoint suspicious code snippets more effectively, leading to more precise repairs. The use of a powerful LLM, such as Claude-4, further aids in crafting repairs that align with the intended functionality of the code, thus addressing syntax errors and potential vulnerabilities.

Overall, RepoRepair's documentation-enhanced approach represents a significant advancement over traditional APR methods, offering a more holistic and cost-effective solution to repository-level program repair. Its integration of LLMs for documentation generation and repair tasks not only improves the accuracy of bug fixes but also reduces the cost per fix, making it a compelling alternative in the field of software engineering.

RepoRepair employs a multifaceted approach to evaluate the correctness of generated patches, ensuring reliability in the repair process by leveraging large language models (LLMs) and structured semantic abstractions. The paper describes how RepoRepair first utilizes a text-based LLM, such as DeepSeek-V3, to generate hierarchical code documentation at both the file and function levels. This documentation acts as auxiliary knowledge, guiding the fault localization process by providing a comprehensive understanding of the repository-level context and dependencies. This step is crucial as it allows the system to identify the most relevant parts of the code that need fixing, thereby increasing the likelihood that the generated patches will address the actual issues present in the code.

Following fault localization, RepoRepair uses a more powerful LLM, like Claude-4, to attempt repairs on the identified suspicious code snippets. The use of LLMs here is significant because they can comprehend complex code structures and dependencies, which are often missed by traditional program repair methods that rely on shallow retrieval techniques. By integrating these advanced models, RepoRepair not only enhances the accuracy of fault localization but also improves the quality of the patches generated. The paper highlights that this approach has been evaluated on benchmarks such as SWE-bench Lite and SWE-bench Multimodal, achieving repair rates of 45.7% and 37.1% respectively, which demonstrates its robust performance across diverse problem domains.

Moreover, the cost-effectiveness of RepoRepair is noteworthy, with the paper reporting a low cost of $0.44 per fix on SWE-bench Lite. This indicates that the system is not only reliable in terms of patch correctness but also efficient in terms of resource utilization. By combining the strengths of LLMs with structured semantic abstractions, RepoRepair ensures that the patches it generates are both correct and reliable, addressing the complex challenges of repository-level automated program repair effectively.

RepoRepair enhances the reliability and accuracy of its automated program repair process by integrating static and dynamic analysis techniques through the use of large language models (LLMs) and code documentation. The paper describes how RepoRepair leverages LLMs to generate hierarchical code documentation, which serves as structured semantic abstractions that guide the repair process. This documentation provides a comprehensive understanding of repository-level context and dependencies, which is crucial for effective fault localization and repair. By employing a text-based LLM like DeepSeek-V3, RepoRepair generates file/function-level documentation that acts as auxiliary knowledge, enabling the system to locate faults more accurately across complex cross-file issues.

Once the faults are localized, RepoRepair utilizes a powerful LLM, such as Claude-4, to attempt repairs on the identified suspicious code snippets. This approach allows the system to comprehend the broader context of the repository, which is often missed by traditional methods that rely solely on shallow retrieval or costly agent iterations. The integration of static analysis, through documentation generation, and dynamic analysis, via LLM-driven repair attempts, ensures that RepoRepair can address a wide range of issues effectively. The paper highlights RepoRepair's performance on SWE-bench Lite and SWE-bench Multimodal, achieving repair rates of 45.7% and 37.1%, respectively, demonstrating its robust and cost-effective capabilities across diverse problem domains. This evidence underscores the significance of combining static and dynamic analysis techniques with LLMs to enhance the reliability and accuracy of automated program repair processes.

RepoRepair presents a compelling case for cost-efficient automated program repair (APR) by leveraging large language models (LLMs) to enhance code documentation and facilitate repository-level fault localization and repair. The paper highlights that RepoRepair achieves a notable repair rate of 45.7% on the SWE-bench Lite dataset at a cost of $0.44 per fix. This indicates a high level of cost-efficiency, particularly when considering the complexity of repository-level repairs compared to isolated function-level repairs. The approach utilizes hierarchical code documentation generated by LLMs, which provides structured semantic abstractions that improve the understanding of repository-level context and dependencies. This method not only enhances the accuracy of fault localization but also reduces the computational overhead typically associated with APR processes.

In comparison, when evaluated on the SWE-bench Multimodal dataset, RepoRepair maintains state-of-the-art performance with a 37.1% repair rate, albeit at a slightly higher cost of $0.56 per fix. This variation in cost across different datasets suggests that while RepoRepair is robust and effective, the complexity and nature of the dataset can influence the cost-efficiency of the repair process. The paper underscores the significance of using powerful LLMs like Claude-4 for repairing identified suspicious code snippets, which contributes to the overall effectiveness of the approach. The ability to deliver high repair rates at relatively low costs across diverse problem domains demonstrates RepoRepair's potential for large-scale adoption in practical applications, where economic considerations are paramount.

Overall, the cost implications of using RepoRepair are favorable, especially when considering the scalability challenges associated with APR. By integrating LLMs for enhanced documentation and repair processes, RepoRepair offers a promising solution that balances performance with cost, making it a viable option for widespread use in software engineering practices.