Hello, My name is Jinrong Zhang

Posted on 2025-03-25 Edited on 2025-04-01

👋 Hello，it's a happy day~~~

My name is Jinrong Zhang, and I am a researcher specializing in computer vision and multimodal large models. 🚀
If you have any interest in collaboration or academic exchange, please feel free to contact me.

🧑‍💻 About Me

📚 PhD Student in Electronic Information at Harbin Institute of Technology, Shenzhen.
🔬 Research Interests:

Video Understanding and Generation
Multimodal Representation
Temporal Action Segmentation

📄 Research Papers

I love publishing and sharing my findings with the world! Here's a list of some of my published research papers:

Just a Few Glances: Open-Set Visual Perception with Image Prompt Paradigm – AAAI, CCF-A, 2025

🔗 Link to orginal paper
End-to-End Streaming Video Temporal Action Segmentation with Reinforce Learning – TNNLS, CCF-B, IF=10.2, 2025

🔗 Link to orginal paper
Flexible Streaming Temporal Action Segmentation with Diffusion Models – ICME, CCF-B, 2025

🔗 Accepted (to be indexed soon)
DTOS: Dynamic Time Object Sensing with Large Multimodal Model – CVPR, CCF-A, 2025

🔗 Accepted (to be indexed soon)
Cluster-Refined Optimal Transport for Unsupervised Action Segmentation – ICASSP, CCF-B, 2025

🔗 Link to orginal paper
Unsupervised Temporal Action Segmentation Based on Wavelet Feature Processing - IJCNN, CCF-C. 2025

🔗 Accepted (to be indexed soon)

On the Papers page, you can also access the key details of these research papers.

💼 Internship Experience

Xiaomi AI Lab – AI Research Intern
2024/2 – 2025/10
- I provided a large model solution for access permission detection at the Xiaomi car factory and successfully implemented it.
- During my internship, I published a paper in AAAI.

🌐 My Profile

🌍 Website: Google Shcoral

Flexible Streaming Temporal Action Segmentation with Diffusion Models

Posted on 2025-03-25 In papers

Flexible Streaming Temporal Action Segmentation with Diffusion Models

Jinrong Zhang, Wenjun Wen, Shenglan Liu, Sifan Zhang, Yuning Ding and Lin Feng

🔗 Accepted by ICME25 (to be indexed soon)

Abstract

Temporal distribution shifts occur not only in low-dimensional time-series data but also in high-dimensional data like videos. This phenomenon leads to significant performance degeneration in video understanding methods such as streaming temporal action segmentation. To address this issue, we propose a flexible streaming temporal action segmentation model with diffusion models (FSTAS-DM). By utilizing streaming video clips with varying feature distributions as control conditions, our model can adapt to the shifts and inconsistency of the distribution between the training and testing domains. Additionally, we have introduced a multi-stage conditional control training strategy (MSCC), which enhances the temporal generalization ability of the model. Our method demonstrates commendable performance on datasets like GTEA, 50Salads, and Breakfast.

End-to-End Streaming Video Temporal Action Segmentation with Reinforcement Learning

Posted on 2024-10-25 Edited on 2025-03-25 In papers

End-to-End Streaming Video Temporal Action Segmentation with Reinforcement Learning

Jinrong Zhang, Wujun Wen, Shenglan Liu, Gao Huang, Yunheng Li, Qifeng Li, Lin Feng

🚀Link to orginal paper

Abstract

The streaming temporal action segmentation (STAS) task, a supplementary task of temporal action segmentation (TAS), has not received adequate attention in the field of video understanding. Existing TAS methods are constrained to offline scenarios due to their heavy reliance on multimodal features and complete contextual information. The STAS task requires the model to classify each frame of the entire untrimmed video sequence clip by clip in time, thereby extending the applicability of TAS methods to online scenarios. However, directly applying existing TAS methods to SATS tasks results in significantly poor segmentation outcomes. In this paper, we thoroughly analyze the fundamental differences between STAS tasks and TAS tasks, attributing the severe performance degradation when transferring models to model bias and optimization dilemmas. We introduce an end-to-end streaming video temporal action segmentation model with reinforcement learning (SVTAS-RL). The end-to-end modeling method mitigates the modeling bias introduced by the change in task nature and enhances the feasibility of online solutions. Reinforcement learning is utilized to alleviate the optimization dilemma. Through extensive experiments, the SVTAS-RL model significantly outperforms existing STAS models and achieves competitive performance to the state-of-the-art TAS model on multiple datasets under the same evaluation criteria, demonstrating notable advantages on the ultra-long video dataset EGTEA.

记HEXO依靠Github Page搭建个人博客需要注意的关键事项

Posted on 2024-01-25 Edited on 2025-03-25 In notes

为了让主页更好看的一系列类似这是文章的优先级和是否置顶、read more按钮的具体使用方法本文不再赘述，根据网上的教程即可。本文主要从个人视角出发总结实践过程中的疑点。

其实搭建这个个人主页的教程网络上的资源还是很完备的，主要流程其实就是正常配置NodeJS的环境，顺带就装好了npm，然后采用npm来安装hexo和各种插件工具。需要注意的主要是图片和公式的使用，可能还有图表，但是在markdown内画论文级别的表格实在是过于痛苦，因此直接使用截图来替代了。本文主要介绍本人的经验下的图片和公式的注意事项。额外补充一个新建Page后在一个Page下挂载多篇文章的方式。

快乐的一天~~~

Posted on 2020-03-24 Edited on 2025-03-25 In life

您好您好，原来又是快乐的一天~~~