Скачать или смотреть #66. K2fsa: Beginner's Questions: Tensorboard, Troubleshooting

#66. K2fsa: Beginner's Questions: Tensorboard, Troubleshooting

Скачать #66. K2fsa: Beginner's Questions: Tensorboard, Troubleshooting бесплатно в качестве 4к (2к / 1080p)

У нас вы можете скачать бесплатно #66. K2fsa: Beginner's Questions: Tensorboard, Troubleshooting или посмотреть видео с ютуба в максимальном доступном качестве.

Для скачивания выберите вариант из формы ниже:

Информация по загрузке:

Cкачать музыку #66. K2fsa: Beginner's Questions: Tensorboard, Troubleshooting бесплатно в формате MP3:

Если иконки загрузки не отобразились, ПОЖАЛУЙСТА, НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если у вас возникли трудности с загрузкой, пожалуйста, свяжитесь с нами по контактам, указанным в нижней части страницы.
Спасибо за использование сервиса video2dn.com

Описание к видео #66. K2fsa: Beginner's Questions: Tensorboard, Troubleshooting

/ youtube-stt-questions-tensorboard-troubles...
Dan is commenting on questions asked on youtube by @dirkstark2870:
00:00 Q1: - Would be interesting to see typical problems and how to solve them. How should TensorBoard-log look like in a good training and why?
Answer:
So, the basic thing people are normally looking for here is that the losses are going down and that they're going down to a small value. So for the current loss, if it gets stuck above 0.6 and it hasn't converged at all then you should kill the run. But this one, you see it goes down to 0.15, I think that's normal. The pruned loss here goes down to 0.03 or something, that's normal.
All of these other values are kind of normal. The grad scale, that's hovering up to around 32. If the grad scale goes down to zero or it gets very small, usually indicate some kind of instability, it's causing large gradients that it's to do with the high precision training, then you get infinities.
Anyway, so this grad scale is normal, learning rate is normal, that should never change.
04:17 Q2: I'm not a researcher and my knowledge isn't deep enough. Typical problem-cases and how to find and fix them would be interesting to see. I see only the loss-function. Is it possible to see how the model is improved in TensorBoard? Is it possible to see something like the WER during the process?
Answer:
Well, if you want to see the word error rate, we don't really have a way to plot it, but you can decode it at different epochs using the decode.py script.
If the model is not converging , the train.py script has various options for debugging. So, do print diagnostics equals true, half the max duration and use fp16 equals false. And then you just put that in a file. And then there's various ways that you can kind of analyze that file.
We have a repository in our K2FSA organization on GitHub called Analyze Diagnostics that has a few scripts. They're not very well documented, unfortunately, but there's various things we can see from those. And if you suspect that a particular module is not working right, you can look at those diagnostics. Sometimes you'll see something.
06:13 Q3: - How to train multi language models: just mixing dev/test/train? or only mix train? What's better.
Answer:
Typically the reason people don't include dev and test data in their language model training is because it kind of, it'll give you unrealistically good results on your test set. If your test set is included in language model training data, it's not really gonna reflect how like really new data or unseen data would behave.
09:05 Q4: - How to tune the params? For example: i increased prune_range from 5 to 12 in my dataset and get a better WER. I do not understand what I am really doing here :) would be interesting to understand the params more with examples (in case of xx do yy), also which params should be touched.
Answer:
I'm a little bit surprised if prune range, increasing it from five would make a big difference.
It could be your data was somehow hard to align or something, but that seems odd to me. It could just be noise. Maybe your test set was too small. The normal thing that we would tune first is we try to mess with the model size a bit. The number of layers, the width, the embedding dimensions, things like that. So the model configuration, and also sometimes we change the learning rates slightly. Not too much, but a little bit.
10:21 Q5: - I watched your video how to train a LM. It was interesting but when I train only 250MB it took so much time on 4090 that I think there must be something wrong. Is it possible to say how long one epoch will take or are there cases that it will run endless. I'll retry that during the next weeks with smaller dataset
Answer:
Language model training can be on the slow side. And I think normally we don't train for that many epochs. Also sometimes we train with multiple GPUs. I think there's a program called PySpy that you can get the trace back of a Python program. So you can sometimes get a stack trace from that
and that might help you figure out like where the code is waiting.

Комментарии

Информация по комментариям в разработке