Friday, February 13, 2015

First semi-supervised network training

We're training the network in semi-supervised fashion (to train labeled and unlabeled data at the same time) without pre-training. We use Pseudo-Label technique.

There are 17481 images for train phase and 5821 images for validation phase => 17481 + 5821 = 23302 images. We have 3 class: -1, 0, and 1 with 11367, 6306, and 5629 images respectively. The unlabeled images belong to class -1.

solver.prototxt as follow:

net: " "
test_iter: 120
test_interval: 1000
#base_lr: 0.01
base_lr: 0.00001
lr_policy: "step"
gamma: 0.1
stepsize: 100000
display: 100
max_iter: 100000
momentum: 0.9
weight_decay: 0.0005
snapshot: 10000
snapshot_prefix: ""
#solver_type: SGD
solver_mode: GPU
We often got loss = nan. And we only tried to reduce base_lr to fix this. More study and research into this issue needed.

Nan means that loss has run of to infinity

Divergent loss can usually be fixed by (either or all):
1) reducing learing rate
2) changing loss norm
3) changing net topology (adding RELUs etc)
4) changing parameter initialization ("xavier" on dev seems especially stabilizing)


We will be back after the holiday !  ^_^


Update (on the same day): after running the train process for awhile ( < 5000 iterations), we saw that the accuracy is around 0.5 and the loss is around 8-20. Maybe the parameters aren't effective, or we need to use some more techniques. Then we decide to do fine-tune process. The amount of images is the same.

finetune_solver.prototxt as follow:

# The train/val net protocol buffer definition
net: " "
# test_iter specifies how many forward passes the test should carry out.
test_iter: 182
# Carry out testing every xxx training iterations.
test_interval: 1000
# The base learning rate, momentum and the weight decay of the network.
# lr for fine-tuning should be lower than when starting from scratch
base_lr: 0.00000001
momentum: 0.9
weight_decay: 0.0005
# The learning rate policy
lr_policy: "step"
gamma: 0.1
#power: 0.75
# stepsize should also be lower, as we're closer to being done
stepsize: 20000
# Display every xxx iterations
display: 500
# The maximum number of iterations
max_iter: 136571
# snapshot intermediate results
snapshot: 5000
snapshot_prefix: " "
#solver_type: SGD
# solver mode: CPU or GPU
solver_mode: GPU

No comments:

Post a Comment