Abstract:[Purposes] To overcome the limitations of existing prediction models that overlook the spatiotemporal correlation characteristics of ambient air quality between adjacent ports, a deep learning ensemble model (GCN-TCN-CA) is established on the basis of cross-attention mechanism to parallel fusion graph convolutional neural network (GCN) and temporal convolutional neural network (TCN)., thereby enhancing the predictive performance of PM?? hourly concentrations for dry bulk port clusters. [Methods] Based on the PM?? concentrations and characteristic meteorological factors at different ports, the spatial topological features between ports can be extracted by GCN, while the long-term temporal dependency between pollutant concentrations and meteorological factors can be captured by TCN. Besides, the PM?? concentrations at the port clusters can be predicted via dynamically fusing the spatial and temporal features using the cross-attention mechanism. [Findings] Taking 18 ports along the Yangtze River in Nanjing are selected as examples, and the prediction performance comparison between six models demonstrates that the GCN-TCN-CA model can reduce mean absolute error by 10.5% to 31.8%, root mean square error by 8.87% to 28.28% and enhance goodness of fit by 2.38% to 13.95%. Additionally, ablation experiments on the models reveal that the GCN can made the most significant contribution to the overall predictive performance of GCN-TCN-CA model. [Conclusions] By fully considering the spatiotemporal correlation characteristics of PM?? concentrations between adjacent ports, the predictive performance of deep learning ensemble models can be significantly improved. When formulating PM?? pollution control measures, it is essential to fully consider the impacts of pollutant transport and dispersion among adjacent ports to achieve synergistic improvements in the ambient air quality of the port cluster.