ReCode: Improving LLM-based Code Repair with Fine-Grained Retrieval-Augmented Generation

👤 作者: Yicong Zhao, Shisong Chen, Jiacheng Zhang, Zhixu Li

💬 备注: Accepted by CIKM 2025

论文速览

The need for this research stems from the limitations of current large language models (LLMs) in the domain of code repair. While LLMs have shown significant promise in tasks like code generation and automated program repair, they often require high training costs and computationally expensive inference processes. Existing retrieval strategies, which rely on holistic code-text embeddings, fail to effectively capture the complex structural nuances of code, leading to less effective retrieval and repair outcomes. This inefficiency highlights the necessity for a more refined approach that can enhance the accuracy and efficiency of code repair tasks.

The proposed solution, ReCode, introduces a fine-grained retrieval-augmented in-context learning framework specifically designed to improve code repair. ReCode incorporates two main innovations: an algorithm-aware retrieval strategy that refines the search space using preliminary algorithm type predictions, and a modular dual-encoder architecture that processes code and textual inputs separately for better semantic matching. To evaluate the effectiveness of ReCode, the researchers developed RACodeBench, a benchmark derived from real-world user-submitted buggy code, which provides a more realistic evaluation environment compared to synthetic benchmarks. Experimental results indicate that ReCode significantly improves repair accuracy while reducing inference costs, demonstrating its practical applicability in real-world code repair scenarios.

📖 论文核心内容

1. 主要解决了什么问题？

The core problem addressed in this paper is the inefficiency and high computational cost associated with existing large language model (LLM)-based approaches for automated code repair. Current methods either require extensive training with high costs or involve computationally expensive inference processes. These approaches often fail to adapt to out-of-distribution defects and novel repair patterns, limiting their practical deployment. The motivation behind this research is to develop a more scalable and efficient solution that can accurately repair code without the need for constant retraining or high inference latency. This problem is significant as it directly impacts the usability and effectiveness of automated code repair systems in real-world software development scenarios.

2. 提出了什么解决方案？

The paper proposes ReCode, a novel fine-grained retrieval-augmented in-context learning framework designed to enhance the accuracy and efficiency of code repair. The key innovation of ReCode lies in its algorithm-aware retrieval strategy and modular dual-encoder architecture. Unlike conventional retrieval strategies that use holistic code-text embeddings, ReCode narrows the search space by predicting algorithm types and processes code and textual inputs separately for fine-grained semantic matching. This approach significantly improves the retrieval quality and supports more accurate code repair generation. Additionally, the introduction of RACodeBench, a benchmark based on real-world buggy code, further distinguishes this work by providing a realistic evaluation platform.

3. 核心方法/步骤/策略

ReCode employs a dual-encoder architecture that separately processes code and textual inputs to enable fine-grained semantic matching. The framework begins by using a large language model to predict the algorithm type of the user's query, which helps narrow down the search space within a pre-constructed algorithm-specific knowledge corpus. This algorithm-aware retrieval strategy enhances the contextual relevance of retrieved exemplars. The dual-encoder architecture allows for modular representations, capturing domain-specific semantics effectively. This method leverages the intrinsic capacities of LLMs in code comprehension and reasoning, integrating algorithmic categorization with semantic modeling to support adaptive code repair generation.

4. 实验设计

The experiments are designed to evaluate the performance of ReCode using RACodeBench, a benchmark constructed from real-world user-submitted buggy code, and several competitive programming datasets. The metrics used include repair accuracy and inference cost. ReCode is compared against existing methods, demonstrating superior repair accuracy with significantly reduced inference costs. The results highlight ReCode's practical value in real-world code repair scenarios, as it achieves higher accuracy while maintaining efficiency. Specific numbers and comparison results are not provided in the abstract, but the paper claims that ReCode outperforms conventional RAG methods in terms of both accuracy and computational efficiency.

5. 结论

The main findings of the paper are that ReCode significantly improves the accuracy and efficiency of automated code repair compared to existing methods. The introduction of an algorithm-aware retrieval strategy and a modular dual-encoder architecture enables more accurate and adaptive code repair generation. The use of RACodeBench for evaluation ensures that the results are grounded in realistic software development scenarios. However, the paper acknowledges limitations in terms of adaptability to entirely novel bug scenarios and suggests future research directions, including further refinement of retrieval strategies and exploration of additional benchmarks to enhance the robustness and generalizability of the proposed framework.

🤔 用户关心的问题

How does ReCode's algorithm-aware retrieval strategy contribute to the localization of bugs in code repair tasks? The user's interest in bug localization aligns with the paper's focus on improving retrieval strategies. Understanding how ReCode narrows the search space using algorithm type predictions can provide insights into its effectiveness in identifying the specific parts of the code that require repair.
In what ways does ReCode's dual-encoder architecture facilitate the generation of accurate patches for different types of bugs, such as semantic, syntax, and vulnerability issues? The user is interested in how LLMs handle various bug types. The dual-encoder architecture's role in fine-grained semantic matching could be crucial for generating appropriate patches for different bug categories, making this a pertinent question.
How does ReCode evaluate the correctness of generated patches, and what role does RACodeBench play in this evaluation process? Patch correctness is a key area of interest for the user. The paper's introduction of RACodeBench as a benchmark for realistic evaluation suggests it may have specific methodologies for assessing patch validity, which would be valuable to explore.
What are the potential interactions between ReCode's framework and static or dynamic analysis tools to enhance the reliability of code repair? The user is interested in the integration of LLM-based repair systems with static and dynamic analysis. Investigating how ReCode could potentially interact with these tools could reveal opportunities for improving repair reliability.
How does ReCode handle the evaluation of repair accuracy and inference cost, and what implications do these metrics have for real-world application of LLM-based code repair? Understanding the balance between repair accuracy and inference cost is crucial for practical applications. This question probes how ReCode's performance metrics translate to real-world scenarios, aligning with the user's interest in the practical deployment of LLMs for code repair.

💡 逐项解答

How does ReCode's algorithm-aware retrieval strategy contribute to the localization of bugs in code repair tasks?

信心指数: 0.90

In what ways does ReCode's dual-encoder architecture facilitate the generation of accurate patches for different types of bugs, such as semantic, syntax, and vulnerability issues?

ReCode's dual-encoder architecture plays a pivotal role in enhancing the accuracy of patches generated for various types of bugs, including semantic, syntax, and vulnerability issues. This architecture is designed to separately process code and textual inputs, which enables fine-grained semantic matching between the input and retrieved contexts. By doing so, ReCode effectively captures the domain-specific semantics of code, which is crucial for addressing the complex semantics and rigid syntax inherent in programming languages. The paper highlights that conventional retrieval strategies often fail to capture these structural intricacies, leading to suboptimal retrieval quality. In contrast, ReCode's modular dual-encoder architecture overcomes these limitations by constructing modular representations for both source code and its accompanying textual descriptions, thereby enhancing the contextual relevance of retrieved exemplars.

Furthermore, ReCode introduces an algorithm-aware retrieval strategy that narrows the search space using preliminary algorithm type predictions. This approach allows the model to infer the algorithm type from the user's query and utilize it to retrieve the most relevant exemplars from a pre-constructed algorithm-specific knowledge corpus. By integrating algorithmic categorization with semantic modeling, ReCode significantly enhances the contextual relevance of retrieved in-context exemplars, supporting more accurate and adaptive code repair generation. This dual-encoder architecture, combined with algorithm-aware retrieval, facilitates the generation of accurate patches for different bug categories by ensuring that the retrieval process is both semantically and contextually aligned with the specific bug type being addressed.

The paper also emphasizes the practical value of ReCode in real-world code repair scenarios, as demonstrated by its superior repair accuracy and efficiency in experiments conducted on RACodeBench and competitive programming datasets. This highlights the effectiveness of ReCode's dual-encoder architecture in not only improving repair performance but also reducing inference costs, making it a scalable and adaptable solution for automated code repair across various bug types.

信心指数: 0.90

How does ReCode evaluate the correctness of generated patches, and what role does RACodeBench play in this evaluation process?

ReCode evaluates the correctness of generated patches through a rigorous and realistic benchmark called RACodeBench, which plays a crucial role in this evaluation process. RACodeBench is constructed from real-world user-submitted buggy code, providing a systematic collection of authentic buggy–fixed code pairs that are manually curated and annotated. This benchmark addresses the limitations of synthetic benchmarks, which often fail to capture the complexity and variability of real-world coding scenarios. By using RACodeBench, ReCode ensures that the evaluation of patch correctness is grounded in genuine software development contexts, thereby enabling precise performance assessment. The paper emphasizes that RACodeBench reflects 'authentic software development scenarios' and supports 'realistic evaluation,' which is critical for assessing the practical applicability of code repair methods in real-world settings.

The significance of RACodeBench in ReCode's evaluation process is underscored by its ability to provide a 'high-quality benchmark' that facilitates 'rigorous and realistic evaluation.' This is particularly important because previous methods often relied on synthetic datasets that did not adequately represent the challenges faced in actual coding environments. By leveraging RACodeBench, ReCode can demonstrate superior repair accuracy and efficiency, as evidenced by the extensive experiments conducted on both RACodeBench and competitive programming datasets. These experiments highlight ReCode's ability to achieve higher repair accuracy with significantly reduced inference cost, showcasing its practical value for real-world code repair scenarios. Thus, RACodeBench is integral to ReCode's evaluation strategy, ensuring that the generated patches are not only correct but also applicable in realistic coding situations.

信心指数: 0.90

What are the potential interactions between ReCode's framework and static or dynamic analysis tools to enhance the reliability of code repair?

The integration of ReCode's framework with static or dynamic analysis tools presents a promising avenue for enhancing the reliability of code repair. ReCode, as described in the paper, is a "fine-grained retrieval-augmented generation framework" that leverages algorithm-aware retrieval strategies and modular dual-encoder architectures to improve code repair accuracy. This approach inherently focuses on capturing the structural intricacies of code, which aligns well with the objectives of static and dynamic analysis tools that aim to analyze code structure and behavior.

Static analysis tools, which examine code without executing it, could benefit from ReCode's "algorithm-aware retrieval strategy". By narrowing the search space using preliminary algorithm type predictions, ReCode enhances the contextual relevance of retrieved exemplars. This strategy could be integrated with static analysis to preemptively identify potential bug patterns based on algorithm types, thereby improving the precision of bug detection before code execution.

Dynamic analysis tools, on the other hand, could complement ReCode's framework by providing execution feedback that informs the retrieval process. The paper mentions iterative self-repair methods that refine initial predictions through "multi-turn prompting and execution feedback." Dynamic analysis could enhance this process by offering real-time data on code execution, which ReCode could use to adjust its retrieval and repair strategies dynamically, thereby increasing repair accuracy and adaptability.

Overall, the integration of ReCode with static and dynamic analysis tools could create a robust system that not only identifies and repairs bugs more accurately but also adapts to novel bug scenarios. This synergy could significantly reduce inference costs and improve repair efficiency, as demonstrated by ReCode's performance on RACodeBench, a benchmark consisting of real-world buggy code pairs. Such integration would leverage the strengths of both static and dynamic analysis, providing a comprehensive solution to the challenges of automated code repair.

信心指数: 0.90

How does ReCode handle the evaluation of repair accuracy and inference cost, and what implications do these metrics have for real-world application of LLM-based code repair?

ReCode addresses the evaluation of repair accuracy and inference cost by leveraging a fine-grained retrieval-augmented generation framework, which significantly enhances both metrics. The paper highlights that ReCode introduces an "algorithm-aware retrieval strategy" and a "modular dual-encoder architecture," which together enable more precise semantic matching between input and retrieved contexts. This approach allows ReCode to achieve higher repair accuracy by narrowing the search space through preliminary algorithm type predictions, thus improving the contextual relevance of the retrieved exemplars. This is crucial because it ensures that the model is not only retrieving relevant examples but also adapting to the specific algorithmic context of the buggy code, which is a common challenge in code repair tasks.

Moreover, ReCode's design significantly reduces inference costs. The paper notes that conventional methods often suffer from "high inference-time latency due to multi-turn reasoning or sampling procedures." In contrast, ReCode's retrieval-augmented generation framework avoids these pitfalls by not requiring parameter updates or additional training, which are typically resource-intensive. By integrating algorithmic categorization with semantic modeling, ReCode supports more efficient code repair generation, making it a practical solution for real-world applications where computational resources and time are often limited.

The implications of these metrics for real-world applications are substantial. High repair accuracy ensures that the code repairs suggested by ReCode are reliable and can be trusted in production environments, which is critical for maintaining software quality and reducing the need for human intervention. Meanwhile, the reduced inference cost makes ReCode a scalable solution that can be deployed in various settings without the prohibitive computational expenses associated with other LLM-based approaches. This balance between accuracy and efficiency positions ReCode as a valuable tool for developers seeking to automate code repair processes effectively.

信心指数: 0.90

📝 综合总结

ReCode's algorithm-aware retrieval strategy plays a pivotal role in enhancing the localization of bugs during code repair tasks by narrowing the search space through preliminary algorithm type predictions. This approach is particularly significant as it addresses the limitations of conventional retrieval strategies that often rely on holistic code-text embeddings, which fail to capture the structural intricacies of code. By first determining the algorithm type of the erroneous code, ReCode effectively reduces the subset of the constructed code knowledge base, thereby focusing the retrieval process on more relevant exemplars. As the paper states, "ReCode first uses the LLM to analyze the user’s query to infer the algorithm type, which is then utilized to retrieve the most relevant exemplars from our pre-constructed algorithm-specific knowledge corpus." This targeted retrieval not only enhances the contextual relevance of the retrieved in-context exemplars but also supports more accurate and adaptive code repair generation.

Furthermore, the modular dual-encoder architecture employed by ReCode separately processes code and textual inputs, enabling fine-grained semantic matching between input and retrieved contexts. This modular approach allows the model to capture domain-specific semantics effectively, which is crucial for accurate bug localization. The paper highlights that "ReCode constructs modular representations for both source code and its accompanying textual descriptions," facilitating a more nuanced understanding of the code's structure and semantics. By integrating algorithmic categorization with semantic modeling, ReCode significantly improves the precision of bug localization, leading to higher repair accuracy and reduced inference costs, as demonstrated in the experimental results on RACodeBench and competitive programming datasets. This innovative strategy underscores ReCode's practical value in real-world code repair scenarios, offering a scalable and efficient solution to the challenges posed by complex programming tasks.

ReCode's dual-encoder architecture plays a pivotal role in enhancing the accuracy of patches generated for various types of bugs, including semantic, syntax, and vulnerability issues. This architecture is designed to separately process code and textual inputs, which enables fine-grained semantic matching between the input and retrieved contexts. By doing so, ReCode effectively captures the domain-specific semantics of code, which is crucial for addressing the complex semantics and rigid syntax inherent in programming languages. The paper highlights that conventional retrieval strategies often fail to capture these structural intricacies, leading to suboptimal retrieval quality. In contrast, ReCode's modular dual-encoder architecture overcomes these limitations by constructing modular representations for both source code and its accompanying textual descriptions, thereby enhancing the contextual relevance of retrieved exemplars.

Furthermore, ReCode introduces an algorithm-aware retrieval strategy that narrows the search space using preliminary algorithm type predictions. This approach allows the model to infer the algorithm type from the user's query and utilize it to retrieve the most relevant exemplars from a pre-constructed algorithm-specific knowledge corpus. By integrating algorithmic categorization with semantic modeling, ReCode significantly enhances the contextual relevance of retrieved in-context exemplars, supporting more accurate and adaptive code repair generation. This dual-encoder architecture, combined with algorithm-aware retrieval, facilitates the generation of accurate patches for different bug categories by ensuring that the retrieval process is both semantically and contextually aligned with the specific bug type being addressed.

The paper also emphasizes the practical value of ReCode in real-world code repair scenarios, as demonstrated by its superior repair accuracy and efficiency in experiments conducted on RACodeBench and competitive programming datasets. This highlights the effectiveness of ReCode's dual-encoder architecture in not only improving repair performance but also reducing inference costs, making it a scalable and adaptable solution for automated code repair across various bug types.

ReCode evaluates the correctness of generated patches through a rigorous and realistic benchmark called RACodeBench, which plays a crucial role in this evaluation process. RACodeBench is constructed from real-world user-submitted buggy code, providing a systematic collection of authentic buggy–fixed code pairs that are manually curated and annotated. This benchmark addresses the limitations of synthetic benchmarks, which often fail to capture the complexity and variability of real-world coding scenarios. By using RACodeBench, ReCode ensures that the evaluation of patch correctness is grounded in genuine software development contexts, thereby enabling precise performance assessment. The paper emphasizes that RACodeBench reflects 'authentic software development scenarios' and supports 'realistic evaluation,' which is critical for assessing the practical applicability of code repair methods in real-world settings.

The significance of RACodeBench in ReCode's evaluation process is underscored by its ability to provide a 'high-quality benchmark' that facilitates 'rigorous and realistic evaluation.' This is particularly important because previous methods often relied on synthetic datasets that did not adequately represent the challenges faced in actual coding environments. By leveraging RACodeBench, ReCode can demonstrate superior repair accuracy and efficiency, as evidenced by the extensive experiments conducted on both RACodeBench and competitive programming datasets. These experiments highlight ReCode's ability to achieve higher repair accuracy with significantly reduced inference cost, showcasing its practical value for real-world code repair scenarios. Thus, RACodeBench is integral to ReCode's evaluation strategy, ensuring that the generated patches are not only correct but also applicable in realistic coding situations.

The integration of ReCode's framework with static or dynamic analysis tools presents a promising avenue for enhancing the reliability of code repair. ReCode, as described in the paper, is a "fine-grained retrieval-augmented generation framework" that leverages algorithm-aware retrieval strategies and modular dual-encoder architectures to improve code repair accuracy. This approach inherently focuses on capturing the structural intricacies of code, which aligns well with the objectives of static and dynamic analysis tools that aim to analyze code structure and behavior.

Static analysis tools, which examine code without executing it, could benefit from ReCode's "algorithm-aware retrieval strategy". By narrowing the search space using preliminary algorithm type predictions, ReCode enhances the contextual relevance of retrieved exemplars. This strategy could be integrated with static analysis to preemptively identify potential bug patterns based on algorithm types, thereby improving the precision of bug detection before code execution.

Dynamic analysis tools, on the other hand, could complement ReCode's framework by providing execution feedback that informs the retrieval process. The paper mentions iterative self-repair methods that refine initial predictions through "multi-turn prompting and execution feedback." Dynamic analysis could enhance this process by offering real-time data on code execution, which ReCode could use to adjust its retrieval and repair strategies dynamically, thereby increasing repair accuracy and adaptability.

Overall, the integration of ReCode with static and dynamic analysis tools could create a robust system that not only identifies and repairs bugs more accurately but also adapts to novel bug scenarios. This synergy could significantly reduce inference costs and improve repair efficiency, as demonstrated by ReCode's performance on RACodeBench, a benchmark consisting of real-world buggy code pairs. Such integration would leverage the strengths of both static and dynamic analysis, providing a comprehensive solution to the challenges of automated code repair.

ReCode addresses the evaluation of repair accuracy and inference cost by leveraging a fine-grained retrieval-augmented generation framework, which significantly enhances both metrics. The paper highlights that ReCode introduces an "algorithm-aware retrieval strategy" and a "modular dual-encoder architecture," which together enable more precise semantic matching between input and retrieved contexts. This approach allows ReCode to achieve higher repair accuracy by narrowing the search space through preliminary algorithm type predictions, thus improving the contextual relevance of the retrieved exemplars. This is crucial because it ensures that the model is not only retrieving relevant examples but also adapting to the specific algorithmic context of the buggy code, which is a common challenge in code repair tasks.

Moreover, ReCode's design significantly reduces inference costs. The paper notes that conventional methods often suffer from "high inference-time latency due to multi-turn reasoning or sampling procedures." In contrast, ReCode's retrieval-augmented generation framework avoids these pitfalls by not requiring parameter updates or additional training, which are typically resource-intensive. By integrating algorithmic categorization with semantic modeling, ReCode supports more efficient code repair generation, making it a practical solution for real-world applications where computational resources and time are often limited.

The implications of these metrics for real-world applications are substantial. High repair accuracy ensures that the code repairs suggested by ReCode are reliable and can be trusted in production environments, which is critical for maintaining software quality and reducing the need for human intervention. Meanwhile, the reduced inference cost makes ReCode a scalable solution that can be deployed in various settings without the prohibitive computational expenses associated with other LLM-based approaches. This balance between accuracy and efficiency positions ReCode as a valuable tool for developers seeking to automate code repair processes effectively.