Intelligent Joint Optimization of Detection and Guidance Based on Convex Optimization Pre-training
DOI:
https://doi.org/10.56028/aetr.15.1.678.2025Keywords:
deep reinforcement learning; TD3 algorithm; intelligent detection-guidance; convex optimization; counter-interference.Abstract
To address the challenges faced by traditional control methods in integrating the scheduling of detection and trajectory resources during the terminal guidance phase of hypersonic glide vehicles (HGVs), this paper proposes an intelligent joint optimization technique based on convex optimization pre-training. Using the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm, we introduce an adaptive detection-guidance weight distribution reward function. This design ensures that the vehicle meets guidance requirements during the terminal phase while allocating sufficient trajectory resources for radar detection. In interference suppression scenarios, the agent can avoid interference zones through trajectory resource scheduling and implement frequency hopping countermeasures via detection resource allocation. By jointly optimizing detection and trajectory resources, the method enhances terminal guidance confrontation performance. Additionally, to improve training efficiency, convex optimization-generated trajectory data is used for pre-training the agent, significantly reducing the convergence time. Simulation results show that this method effectively improves the performance of detection and counter-interference in the terminal guidance phase, providing new technical means for intelligent guidance in complex battlefield environments.