"Deep and Machine Learning methods for document clustering and classification" tutorial will be held by Priv.-Doz.Dr. Alexei I. Streltsov (Senior Data Scientist, SAP SE, Germany) and HybriLIT heterogeneous computation team in frames of The XXIII International Scientific Conference of Young Scientists and Specialists (AYSS-2019) on the basis of the developed ecosystem for ML/DL.
In this tutorial, we consider a complete workflow of a typical Data Science project dealing with text documents. We define a problem, generate data, analyze data, explore relevant features – discuss several ways how to extract and describe semantic information, and show how to incorporate/augment it by an additional non-semantic one (which might help to improve the results). Next, we consider, construct and apply several standard Machine Learning (ML) models to describe our data: we cast it to a classification and regression problems. Then, we analyze an efficiency of the ML methods as well as a role, impact and relevance of our semantic and non-sematic features. Next, we show how to apply Deep Learning methods to attack the same problem – we consider simple DNN (Deep Neural Network) and CNN (Convolutional Neural Network) models. At the end we contrast our ML and DL results, discuss their pluses and minuses: efficiencies, required computational resources, possible way to improve them.
Tutorial supports an active and passive participations. I will use an alive Jupiter Notebook presentation to describe, discuss and execute each end every block of the Python-code requited for the above program/workflow. The corresponding blocks will be shared/available on a dedicated Slack channel (HybriLIT subscription required: https://web-stc.jinr.ru). If you have a valid account on the HybriLIT cluster you will be able to copy/paste them from the Slack channel and re-execute it in on-line mode in your own Notebook via GITLab (https://jhub.jinr.ru/) service. No extra work on your side to install, tune, support the required python packages: JHub – already did it for you.
IMPORTANT: Please bring your own laptops!
Step by step instructions:
- Please follow this link:
https://join.slack.com/t/mctdhb-lab/shared_invite/enQtNjEwMDc2MzQwODIwLWYzMTcxYjA2ODY1MDk2YTNhYTRjZjg1N2ExYmRmMGMwMGExNzVmOWUzMzliMDAzNmNkOTc5MDk1MjMyYjEwNzE - Enter you e-mail address and verify it;
- Follow the link that was sent to your e-mail;
- Create an account.
Welcome to the mctdhb-lab channel !
P.S. In case of any problems, please contact: shushanik@jinr.ru