Existing prompts typically emphasize either global or local information, which fails to excavate or fully leverage the effective information of cross-modal data, resulting in the subpar performance of ...