Jiazheng Wen

logo_uni_tue          logo_uni_tue

Hello! I am a Ph.D student at HIT, advised by Junbao Li and Huanyu Liu.

My name is Jiazheng Wen and english name is Joshua Wen. In 2017, I received a bachelor's degree in information engineering from Xi'an Jiaotong University. In 2022, I received a master's degree in electronic science and technology from Harbin Engineering University. Nowadays, I am studying for a Ph.D in Faculty of Computing, Harbin Institute of Technology. I worked as a research intern at Focused Loong Technology Co.,Ltd. from 2019-2022. My research focuses on deep reinforcement learning and vision enhancement.

Currently, my main interest is in how to enhance the perception of computer vision algorithms through deep reinforcement learning. If you want to discuss anything research related, please feel free to reach me :)

Email  /  ORCID  /  Google Scholar  /  Github  /  GitHub Stars: 63

profile photo

News


Publications

seg-llava
Vision Language Large Model
Seg-LLaVA: A Small-Scale Large Vision-Language Model with External Visual Prompts
Tianxing Guo, Huanyu Liu, Jiazheng Wen, Junbao Li,
Neurocomputing.
paper

With recent significant advancements in large vision-language models (LVLMs), image-text understanding capabilities have substantially improved. However, a notable gap remains in fine-grained region understanding. Moreover, the resource consumption for training and testing large-scale LVLMs is immense, making them less accessible to researchers with limited resources. In this paper, we propose a small-scale LVLM, Seg-LLaVA, which employs a lightweight visual prompting method that leverages a semantic segmenter and a small-scale large language model (LLM). By integrating fine-grained knowledge generated by a specialized instance segmentation model with the original image into a multi-layer linear model, we enable the model to perceive object boundaries and types in the image without significantly increasing the number of training parameters, thereby greatly enhancing its visual understanding capabilities. Additionally, we adopt an efficient training approach, allowing Seg-LLaVA to achieve outstanding performance while further reducing resource requirements. Experimental results show that our model excels across multiple benchmarks and demonstrates strong fine-grained perception capabilities.

siamsdt
Computer Vision
SiamSDT: A Self-Adaptive Dynamic Template Siamese Network for Airborne Visual Tracking of MAVs on Heterogeneous FPGA-SoC
Yuxin Zhang, Jiazheng Wen, Ran Wu, Huanyu Liu, Junbao Li,
The Journal of Supercomputing.
paper

We propose a robust and lightweight tracking model, self-adaptive dynamic template Siamese network (SiamSDT).

acp
Computer Vision
Image inpainting with aggregated convolution progressive network
Yang li, Jia Zhai, Wen Lu, Haipeng Guo, Jiazheng Wen, Huanyu Liu, Junbao Li,
IET Image Processing.
paper

This paper adopts a progressive network approach to design an aggregated convolution progressive network (ACP) inpainting model. It enhances the inpainting ability of interference regions in various types of images with different levels of information.

soc_gs
Computer Vision
Cross-Spectral Gaussian Splatting with Spatial Occupancy Consistency
Haipeng Guo, Huanyu Liu, Jiazheng Wen, Junbao Li.
AAAI2025.
project page / paper

Recent advances have shown the possibility of jointly optimizing cross-spectral relative poses and neural radiance fields using normalized cross-device coordinates. However, such method suffers from cross-spectral misalignment when collecting data asynchronously from devices and lacks the capability to render in real-time or handle large scenes. We address these issues by proposing cross-spectral Gaussian Splatting with spatial occupancy consistency, strictly aligns cross-spectral scene representation by sharing explicit Gaussian surfaces across spectra and separately optimizing each view's extrinsic using a matching-optimizing pose estimation method.

ptz_camera_control
ptz_camera_control
Reinforcement Learning Computer Vision
Automatic Visual Enhancement of PTZ Camera Based on Reinforcement Learning
Hao Fang, Huanyu Liu, Jiazheng Wen, Zhonglin Yang, Junbao Li, Qi Han.
Neurocomputing.
project page

In this paper, we propose an advanced pan-tilt-zoom (PTZ) camera control method that does not require intrinsic camera parameters. The goal is to accomplish the visual enhancement task of low-confidence objects.

mva_ptds_centertrack
mva_ptds_centertrack
Computer Vision
PTDS CenterTrack: Pedestrian Tracking in Dense Scenes with Re-Identification and Feature Enhancement
Jiazheng Wen, Huanyu Liu, Junbao Li.
Machine Vision and Applications.
paper

In this work, we propose PTDS(Pedestrian Tracking in Dense Scene) CenterTrack based on the CenterTrack for object center point detection and tracking.

rs_trc-odf
Reinforcement Learning Computer Vision
A Task-Risk Consistency Object Detection Framework Based on Deep Reinforcement Learning
Jiazheng Wen, Huanyu Liu, Junbao Li.
Remote Sensing-Special Issue: Artificial Intelligence Algorithm for Remote Sensing Imagery Processing III.
project page / paper

This study introduces a Task-Risk Consistent Intelligent Detection Framework (TRC-ODF) for object detection in optical remote sensing images.

pcia
pcia
Computer Vision
CenterCounter: A Video Pig Detection and Counting Network Based on Object Center Point
Jiazheng Wen, Yan Cang, Yulong Qiao.
Computers and Electronics in Agriculture.
project page/ news page

The purpose of this project is to continuously count the moving objects in aisles in a fixed-view video scene, and specify the positive direction of movement, which means that the objects moving in the opposite direction should be counted down.

mot
Vision Language Large Model
A Vision-Language Large Model Perception Enhancement Method and System Based on Visual Prompting
Huanyu Liu, Tianxing Guo, Jiazheng Wen, Junbao Li, Yutong Jiang, Yue Zhou.
Patent.
patent page

To address the urgent need for deploying small-scale large language models (LLMs) under resource constraints, the proposed method works as follows: a segmentation component generates masks and an object segmentation list from the original image; a visual encoder processes the original image and masks to extract multi-level visual features highlighting object positions/boundaries, which are then refined via layer normalization and MLP layers into final visual features; finally, the masks, segmentation list (as text instructions), and visual features are fed into a vision-language large model (VLLM) for autoregressive semantic generation. This method also enhances VLLMs’ object perception and question-answering abilities without adding extra training parameters.

mot
Reinforcement Learning
An Offline Reinforcement Learning Agent Method for Action Exploration Based on Expected Reward Regularization
Huanyu Liu, Jiazheng Wen, Junbao Li.
Patent.
patent page

This method resolves the poor reliability of trajectory stitching and strategy generalization in complex tasks with existing approaches. It involves: 1) building sequence-modeled state/action losses for iterative training; 2) designing a weighted squared error-based RTG loss; 3) using a double Q-learning framework (with two Q-functions, conservative Q-learning constraints, and Boltzmann distribution) to optimize action exploration; 4) integrating state, action, RTG regularization, and Q-value losses into a joint optimization objective; 5) generating diverse action predictions via noise-perturbed RTG candidate sampling; 6) selecting the highest-Q action (evaluated by double conservative Q-functions) for execution. It is primarily used in agent exploration.

mot
Computer Vision
A MAV-Borne Target Tracking Method and System Based on Adaptive Dynamic Templates
Yuxin Zhang, Junbao Li, Huanyu Liu, Jiazheng Wen.
Patent.
patent page

This invention solves the trade-off between tracking performance and computational complexity in MAV-borne target tracking. Its core steps: 1) Set initial, adjacent, and memory templates; 2) Input current frame search features and all templates into the adaptive template fusion (STF) module to generate the final template; 3) Correlate the final template with the search template to get a response map, judge tracking state, and update adjacent/memory templates; 4) The memory template module uses temporal cascading to integrate historical tracking key info, fitting all history into limited memory; 5) The adaptive fusion module adjusts template weights dynamically across tracking stages via template-search feature similarity matrices. It applies to MAV-borne target tracking.

mot
Reinforcement Learning Computer Vision
A Long-Term Single-Target Tracking Method, System, and Device Based on Sequence Modeling Reinforcement Learning
Huanyu Liu, Jiazheng Wen, Junbao Li.
Patent.
patent page

This invention resolves poor tracker performance in long-term tracking. Key steps: 1) Build a sequence modeling reinforcement learning-based long-term tracker with Transformer-based perception and decision layers (perception layer’s visual Transformer outputs feed the decision Transformer, which feeds action sequences back); 2) The tracker integrates sequence modeling reinforcement learning to adaptively select baseline short-term trackers; 3) Make decisions via memory sequence analysis; 4) Individual short-term trackers impact overall results (jointly determined by visual encoder and tracking method); 5) The decision layer dynamically optimizes search region position for tracking. It applies to long-term single-target tracking.

mot
Computer Vision
A dense pedestrian multi-object tracking method based on feature fusion, computer equipment, and storage medium
Huanyu Liu, Jiazheng Wen, Junbao Li, Zhonglin Yang.
Patent.
patent page

A dense pedestrian multi-target tracking method based on feature fusion, computer equipment, and storage medium, belonging to the field of computer vision tracking technology, solving the problem of existing tracking methods for pedestrians in dense scenes.

spark_plug
Computer Vision
An intelligent spark plug appearance defect detection system
Yan Cang, Jiazheng Wen, Yulong Qiao, Chunyu Chen.
Patent.
patent page

The present invention belongs to the field of image processing and specifically relates to an intelligent spark plug appearance defect detection system


Workshops

ai&game
AI&Game
A turn-based gaming agent based on an RL environment built from Sid Meier's Civilization game series.
Jiazheng Wen.
project page

Recently, I've been working on an interesting personal project. This project plans to build an RL environment suitable for turn-based games. We will build this environment based on Sid Meier's Civilization 5 and 6. It is still in its infancy, and everyone is welcome to discuss and participate!


Other activities

Academic activities
Award
  • 2025 National Scholarship for Doctoral Students ¥30,000.
  • 2024 "QAX Network Security" Doctoral Scholarship ¥10,000.
Teaching Assistant

Template based on Jon Barron's website.