To achieve mid-air interactions for smartphones, acoustic-sensing, a technique using the built-in speaker and microphone of smartphones, is promising. However, detecting hand poses on the nearsurface of touchscreens remains challenging due to the arrangement of the built-in speaker and microphone. To address this, we present Acoustic+Pose, a novel approach for combining conventional touch interactions with near-surface hand-pose estimation to enable a wide range of interactions. We focused on smartphones incorporating Acoustic Surface, a technology that vibrates the entire smartphone screen to emit sound over a wide area. We used this technology to extend the input space to the near surface of touchscreens. We trained machine-learning models to recognize hand poses in the near-surface area and demonstrated interaction techniques to use the recognized poses for a new modality of smartphone input. Through an evaluation, we confirmed that the trained models recognized 10 hand poses with 90.2% accuracy.