Abstract:[Purposes] To address the common reliance of existing deep learning-based traffic signal control methods on fully observable traffic state information, this study constructs an intelligent traffic signal control model suitable for sparse observation environments. [Methods] Firstly, using the traffic status of floating cars as the core observational input, and through spatial discretization dividing each entrance lane at the intersection into fixed grid cells, a three-dimensional tensor state representation is constructed comprising a vehicle position matrix, a velocity matrix, and a lane congestion matrix. Next, a continuous reward mechanism based on vehicle congestion levels is introduced into the policy design. Combined with fixed minimum green time and phase switching rules, this guides the agent to dynamically select signal phases under different conditions, achieving the goal of minimizing delays. Finally, two experimental scenarios—single-intersection and multi-intersection—are established. Comparative tests are conducted against various algorithms under different traffic conditions. [Findings] The proposed method was compared with fixed-time control, Deep Q-Network (DQN), and Proximal Policy Optimization (PPO). The proposed model demonstrated faster convergence and greater stability under various traffic loads, particularly at low floating vehicle penetration rates. It significantly reduced average delay time, average queue length, and average travel time, resulting in a notable improvement in overall traffic efficiency. [Conclusions] The research findings validate the effectiveness and robustness of the proposed method under partially observable conditions. The model maintains excellent control performance in low-permeability environments, providing a feasible technical pathway and theoretical foundation for the intelligent and coordinated development of urban traffic signal control systems under conditions of limited real-world data.