Reproducing SoTA Papers with Model Zoo
Most people initially use IMDL-Benco with the intention of reproducing SoTA (State-of-the-Art) papers. If you have a certain level of deep learning experience (such as working with the PyTorch framework, Linux Shell scripts, multi-GPU parallel parameters, etc.), this will be very simple.
Initialization with benco init
After installing Benco, create a clean, empty folder as your working directory, and then run the following command:
benco init model_zoo
IMDL-BenCo will generate all the necessary Python scripts, shell scripts, default datasets, and necessary configuration files needed to reproduce the model zoo. The basic folder structure will look like this:
.
├── balanced_dataset.json
├── configs
│ ├── CAT_full.yaml
│ └── trufor.yaml
├── runs
│ ├── demo_catnet_protocol_iml_vit.sh
│ ├── demo_catnet_protocol_mvss.sh
│ ├── demo_test_cat_net.sh
│ ├── demo_test_iml_vit.sh
│ ├── demo_test_mantra_net.sh
│ ├── demo_test_mvss.sh
│ ├── demo_test_object_former.sh
│ ├── demo_test_pscc.sh
│ ├── demo_test_robustness_cat_net.sh
│ ├── demo_test_robustness_iml_vit.sh
│ ├── demo_test_robustness_mantra_net.sh
│ ├── demo_test_robustness_mvss.sh
│ ├── demo_test_robustness_object_former.sh
│ ├── demo_test_robustness_pscc.sh
│ ├── demo_test_robustness_span.sh
│ ├── demo_test_robustness_trufor.sh
│ ├── demo_test_span.sh
│ ├── demo_test_trufor.sh
│ ├── demo_train_backbone_segformer.sh
│ ├── demo_train_backbone.sh
│ ├── demo_train_cat_net.sh
│ ├── demo_train_iml_vit.sh
│ ├── demo_train_mantra_net.sh
│ ├── demo_train_mvss.sh
│ ├── demo_train_object_former.sh
│ ├── demo_train_pscc.sh
│ ├── demo_train_span.sh
│ └── demo_train_trufor.sh
├── test_datasets.json
├── test.py
├── test_robust.py
└── train.py
In the root directory, you will find the main logic scripts for training and testing: train.py
, test.py
, and test_robust.py
. Our design philosophy encourages you to modify these scripts as needed!
The ./runs
folder contains all the shell scripts used to start the corresponding training processes. These shell scripts will call the Python scripts in the root directory, and the function of each script can be inferred from its name.
For example,
demo_train_trufor.sh
is the script used to train TruFor,demo_test_mvss.sh
is the script to test MVSS-Net, anddemo_test_robustness_cat_net.sh
is the script for robustness testing of CAT-Net.
The ./configs
folder contains configuration files for various models. You can adjust the corresponding hyperparameters by modifying these files, and the default paths will be automatically read by the shell scripts.
Modifying Dataset Paths
Open the target shell script you want to use and make sure to modify the following fields to point to your dataset and checkpoint paths:
Training Scripts:
Field Name Function Description data_path Training dataset path Refer to Dataset Preparation test_data_path Test dataset path Refer to Dataset Preparation Testing Scripts:
Field Name Function Description test_data_json Path to the test dataset JSON, which contains multiple datasets and their paths Refer to the final section of Dataset Preparation checkpoint_path Folder path containing the checkpoint to be tested This is a folder containing at least one checkpoint. The filename must include a number before the extension indicating the epoch, e.g., checkpoint-68.pth
Robustness Testing Scripts:
Field Name Function Description test_data_path Test dataset path Refer to Dataset Preparation checkpoint_path Folder path containing the checkpoint to be tested This is a folder containing at least one checkpoint. The filename must include a number before the extension indicating the epoch, e.g., checkpoint-68.pth
For necessary PyTorch multi-GPU training parameter adjustments, please learn or consult ChatGPT. Common fields include:
CUDA_VISIBLE_DEVICES=0
: Specifies using only the GPU with this ID--nproc_per_node=4
: Total number of GPUs to use
Passing Hyperparameters to nn.Module
via Shell Scripts (Syntactic Sugar)
In addition, each model may have its own special hyperparameters. In BenCo, extra command line parameters (those not required by train.py
) can be directly passed into the __init__
function of the nn.Module
. This functionality is implemented here.
You can check the __init__()
function of a model to understand its capabilities.
For example, in the training shell script demo_train_trufor.sh
, the following fields:
--np_pretrain_weights "/mnt/data0/dubo/workspace/IMDLBenCo/IMDLBenCo/model_zoo/trufor/noiseprint.pth" \
--mit_b2_pretrain_weights "/mnt/data0/dubo/workspace/IMDLBenCo/IMDLBenCo/model_zoo/trufor/mit_b2.pth" \
--config_path "./configs/trufor.yaml" \
--phase 2 \
will be directly passed to the __init__
function of the TruFor nn.Module
, specifically at this location.
@MODELS.register_module()
class Trufor(nn.Module):
def __init__(self,
phase: int = 2,
np_pretrain_weights: str = None,
mit_b2_pretrain_weights: str = None,
config_path: str = None,
det_resume_ckpt: str = None):
super(Trufor, self).__init__()
Note!!! All hyperparameters in the shell scripts under Model_zoo are currently the optimal experimental settings according to the authors' team.
Pretrained Weights Download
Additionally, different models may have custom parameters or require pretrained weights. This part will be completed in future documentation. TODO
For now, you can directly refer to the README files in each model folder under this path to download the necessary pretrained weights.
Running Shell Scripts
Switch to the root directory (where the train.py
, test.py
, and other scripts are located), and then run the following command directly:
sh ./runs/demo_XXXX_XXXX.sh
Pay attention to the path relationships and ensure that the configuration files and Python scripts are correctly referenced by the shell commands.
If you don't see any output, don't panic. To save logs, all output and errors have been redirected to files.
If everything runs correctly, a folder named output_dir_xxx
or eval_dir_xxx
will be generated in the current path. Inside this folder, you will find three logs: one for standard output (logs.log
), one for warnings and errors (error.log
), and one specifically for scalar statistics (log.txt
).
If the model runs successfully, you should see the model iterating and outputting new logs at the end of logs.log
:
......
[21:25:16.951899] Epoch: [0] [ 0/80] eta: 0:06:40 lr: 0.000000 predict_loss: 0.6421 (0.6421) edge_loss: 0.9474 (0.9474) label_loss: 0.3652 (0.3652) combined_loss: 0.8752 (0.8752) time: 5.0059 data: 1.5256 max mem: 18905
[21:25:52.536949] Epoch: [0] [20/80] eta: 0:01:55 lr: 0.000002 predict_loss: 0.6255 (0.6492) edge_loss: 0.9415 (0.9405) label_loss: 0.3607 (0.3609) combined_loss: 0.8660 (0.8707) time: 1.7791 data: 0.0004 max mem: 20519
[21:26:27.255074] Epoch: [0] [40/80] eta: 0:01:13 lr: 0.000005 predict_loss: 0.6497 (0.6615) edge_loss: 0.9400 (0.9412) label_loss: 0.3497 (0.3566) combined_loss: 0.8729 (0.8730) time: 1.7358 data: 0.0003 max mem: 20519
[21:27:02.311510] Epoch: [0] [60/80] eta: 0:00:36 lr: 0.000007 predict_loss: 0.6255 (0.6527) edge_loss: 0.9404 (0.9404) label_loss: 0.3400 (0.3519) combined_loss: 0.8643 (0.8708) time: 1.7527 data: 0.0003 max mem: 20519
......
If something goes wrong, please check the error information in error.log
and resolve the issues.
All the checkpoint-XX.pth
files will also be output to output_dir_xxx
for future use.
It is highly recommended to monitor the training process using TensorBoard with the following command. Benco provides numerous automated API interfaces to facilitate visualization, making it easier to confirm that the training is proceeding correctly.
tensorboard --logdir ./
At this point, you have completed the process of reproducing the SoTA model.