Basic Edition "Data Tools First"

Course descriptions

MATLAB
MATLAB is programmable numerical analysis software for data analysis and visualization. This course covers basic syntax, image processing, and other applications using MATLAB.Detailed description
Numerical analysis has been used in many areas, including weather forecasts, collision simulations for automobiles, and rocket vector calculations. In numerical analysis, various issues are modeled using mathematical equations that are solved by computers.
MATLAB by Mathworks is software that performs numerical analysis interactively. The benefits of MATLAB include ease of handling vectors, matrices, a large number of functions (>1000), and visualization (plotting). Additionally, applications can be developed using a dedicated language. Although it is a proprietary product, it is used around the world due to its wealth of functions and adaptability to research purposes.
This course covers basic operations and the syntax of MATLAB as well as its various functions, including visualization, interpolation, and equations. Additionally, a method to generate block diagrams using Simulink is introduced. The accompanying lab allows students to implement simple algorithms and image processing.
R
R is a programming language for statistical analysis, its development, and execution environment. This course covers fundamentals of programming in R and regression analysis.Detailed description
As applications such as forecasting sales based on web browsing and order histories increase, the demand to find correlations between products that were once considered unrelated has drastically increased, especially in corporations that have accumulated various data. Consequently, statistical analysis is becoming a standard skillset to evaluate all this data.
R is an open source programming language and a development environment for statistical analysis. Packages published by CRAN (the Comprehensive R Archive Network) can be added to expand its functionality. Furthermore, R can be used with the R commander (Rcmdr) to enable GUI-based operations or it can be used through an integrated development environment, RStudio.
This course provides an overview of data analysis using R. Then ways to increase sales at a department store based on purchase histories are devised through case studies. In addition to learning data types and structures, reading, data forming, and programming in R, the accompanying lab allows student to use RStudio for clustering customers by analyzing visit frequencies and sales per customer.
SAS
SAS is programmable statistical analysis software. This course covers basic procedures and programming in data analysis.Detailed description
SAS is statistical analysis software from the SAS Institute, which is commonly used in many organizations, including research institutes, medical institutions, financial institutions, and distribution businesses. SAS supports various data formats such as Microsoft Excel and HTML, and covers the entire workflow of analysis (from data formatting, processing, and analysis to report preparation). It can also process a large amount of data at a high speed. In SAS, a program is written in a programming environment called Base SAS, which realizes data processing and analysis. Additionally, another tool, SAS Enterprise Guide (EG), allows a series to be analyzed using GUI operations without programming.
This course covers the structure and basic use of Base SAS and EG. In addition, applied topics such as data processing using functions and coupling, and the generation of libraries are covered. The accompanying lab provides students with hands-on analysis experience using Base SAS and EG with publicly-available data in Excel and HTML.
Hadoop
Hadoop is a software infrastructure for distributed processing of large-scale data. This course covers its structures (MapReduce and HDFS), operation procedures, and programming.Detailed description
Processing large-scale data is more efficient using a distributed computation with multiple computers than a parallel computation with multiple cores in a single computer. For this reason, distributed processing is often employed when analyzing logs or in data mining. Another advantage of distributed computing is that it is fault resistant because even if one computer fails, the other computers can continue processing.
Apache Hadoop is a software infrastructure for distributed processing developed by the Apache Software Foundation. It is freely available. Several vendors are also offering distributions of Hadoop. Hadoop has an excellent expandability and can support use of several thousands of computers and petabytes of data. For these reasons, many corporations and organizations such as search engines, SNS, and on-line stores use Hadoop.
This course provides an introduction to parallel computing and distributed computing, overviews the structure of Hadoop (distributed file systems called HDFS and MapReduce), and discusses methods to manage jobs and HDFS from a website. The accompanying lab exposes students to command operations, distributed computing with Java, and SequenceFile operations.
SQL
SQL is a database operation language. This course covers the basics of SQL using MySQL.Detailed description
Databases organize, consolidate, and store data. Thus, they facilitate using data for searching and other operations as well as enable aggregation of information for more efficient utilization. The database management system (DBMS), which builds and manages databases on a computer, is used extensively in many applications such as search engines, customer data processing, and POS systems.
SQL is a specialized language used to manipulate data in a database. Although commands may be expanded from some vendors, basic commands and their usage are roughly identical regardless of databases due to ISO standardization.
This course covers the basics of SQL using MySQL, which is a common open-source database. Specifically, students learn to create, delete, and concatenate tables as well as to search, modify, delete, sub-query, and restrict data. Furthermore, canonicalization, which is required to appropriately divide tables into a more efficient organization, is also discussed.
Mahout
Mahout is a machine-learning library that supports distributed processing using Hadoop. This course covers methods for recommendations, clustering, and classifications.Detailed description
Computer-based analysis and judgment through machine learning of information collected and accumulated from modern society allows previously unnoticed correlations to be discovered and information to be detected and categorized. For example, machine learning is used in automatic vehicle maneuvering, stock price forecasting, and spam email filtering.
Apache Mahout is an open-source machine-learning library developed by the Apache Software Foundation that features a number of machine learning algorithms. Apache is launched from a Java-based application. Although it is implemented using Java, it is possible to perform machine learning only with commands. Mahout can also perform distributed computing on a large quantity of data using Hadoop, software for distributed computing.
This course overviews Mahout, typical algorithms, and utilization methods. Select algorithms, including recommendation, clustering, and categorization are investigated in detail. The accompanying lab exposes students to command execution, Java programming, and Hadoop utilization.
CUDA
CUDA is an integrated development environment for general GPU purposes. This course covers parallel computing methods using a graphics board.Detailed description
GPUs, which are used on graphic cards, feature cores with excellent computation performances. General-purpose computing on graphics processing units (GPGPU) can realize a relatively inexpensive high-speed computation system for large-scale computations. For example, for high performance computing (HPC), GPUs with hundreds of thousands of cores are used to perform simulations for automobiles and analysis of genome sequences.
CUDA is a parallel computing platform and programming model that NVIDIA makes freely available. Although CUDA requires a graphics card from NVIDIA, it is widely used because the same method can be applied to a home PC or a large-scale HPC.
This course covers the basics of GPU-based parallel computing using CUDA. After an introduction to the GPU architecture and CUDA, students engage in actual programming on the CUDA platform. Additionally, methods to improve speed are discussed and a comparison of computation performances between CPU alone and GPGPU demonstrates the superior performance of GPGPU.
MPI
MPI is a set of standards for parallel computing. This course covers methods for parallel computing and inter-host communications using multiple supercomputers.Detailed description
Although the performance improvement for single processors is small these days, menu cores and high-performance computing (HPC), where general-purpose processor-based nodes are connected for parallel operations, are attracting attention. Efficient large-scale computations using these methods require common software.
The Message Passing Interface (MPI) is a set of standards for message communication API for parallel computing. Due to standardization, MPI is independent of the OS and can be used with a variety of programming languages. For example, MPI is used with Computer K of RIKEN and other supercomputers around the globe.
This course covers the basics using MPI and several programming languages for parallel computing. The accompanying lab allows students to practice parallel programming using MPI with the supercomputer at the Information Technology Center.
ROS
ROS is a framework to develop software for robotics. This course covers the methodology for Simultaneous Localization and Mapping (SLAM) using sensors.Detailed description
Due to the recent acceleration in research and development of robotics and automatic vehicle maneuvers, new results are being practically implemented. However, a system must be built using a variety of required element technologies. Although a system that coordinates all the different systems is necessary, developing such a system is challenging.
ROS is an open-source software framework for robotics software development developed by the Willow Garage Corporation. ROS is a meta-operating system that offers functions such as hardware abstraction, low-level device control, general function implementation, inter-process communications, and package management, and facilitates an integrated robot development environment.
This course covers the fundamentals of robotics research using ROS, including an introduction to key concepts in ROS and available software and hardware. The accompanying lab provides students with basic knowledge of ROS and experience with SLAM using a laser scanner from Hokuyo Automatic to infer location and to prepare an environmental map.
Stata
Stata is a statistical analysis software package that is used in economics, sociology, medical care, etc. This course covers basic operations, execution of regression analysis, result displays, and figure and table preparations.
OpenCV
OpenCV is an image-processing library for computer vision. This course covers methodologies for camera image processing, object detection, etc.Detailed description
Many modern devices such as smart phones and tablet computers have an integrated camera. In addition to acquiring still images and videos, these cameras are also used in computer vision such as being processed or feature extraction for object detection or tracking. One example is a function in newer digital cameras that recognizes and focuses on people's faces. Another application is an automatic braking system for vehicles where images from a camera provide vital information.
OpenCV is an image-processing library for computer vision developed by the Willow Garage Corporation. OpenCV implements many image processing and structural analysis functions to facilitate computer vision processing with relative ease. It can perform pattern recognition such as human face detection, motion detection, and object tracking by real-time analysis of videos.
This course covers the fundamentals of computer vision processing using OpenCV. Then camera image processing, human face recognition using a classifier, object detection using machine learning, etc. are explored in detail. The accompanying lab allows students to develop GUI applications for computer vision processing.
OpenGL
OpenGL is an API for graphics processing. This course covers methodologies for two- and three-dimensional graphical drawings.Detailed description
OpenGL is a graphics library for three-dimensional CG that supports many platforms, including Windows, MacOS, and Linux. OpenGL can utilize hardware functionalities for high-speed performance and can generate three-dimensional CG with relative ease. These features make it a popular choice in broad areas, including simulation analysis, CAD, gaming, and multimedia. Recently, a subset for embedded applications called OpenGL ES has been used in Android and iOS.
This course provides overview of OpenGL. Function-specific examples facilitate learning the appropriate implementation methodologies for different purposes. The accompanying lab provides students with hands-on experience with basic functions for windows, drawing graphics, hidden-surface processing, event processing, shading, light source setting, and texture mapping.
Linux
Linux is a common UNIX-compatible OS. This course covers the fundamentals of Linux such as shell, editor, and file transfer using CenOS.Detailed description
Linux is a UNIX-compatible OS that is widely used not only in personal desktop computers, but also in embedded computers, servers, and super computers. Linux is used in research and development because it supports many applications, libraries, and development tools.
Because many of the courses offered under Data Tools First involve tools running on Linux, this course assumes that students are somewhat familiar with Linux. This course provides an overview of shell, and instructions on basic commands, editors (vi and Emacs), file transfer protocols (such as ftp and sftp), and remote login (telnet and ssh). The accompanying lab, which uses a popular Linux distribution for servers called CentOS, allows students to learn techniques that are immediately useful in Linux systems.
Android
Android is an OS tailored for smart phones and tablet computers. This course covers writing and execution of Android applications using a sensor.Detailed description
Android is an OS widely used in smart phones and tablet computers. Devices running Android are equipped with GPS and many sensors (temperature, orientation, and acceleration sensors), which acquire information through standard APIs of Android. Such information can be used as input data or to display UIs after processing the information. Most Android applications are written in Java.
This course covers fundamentals of Android applications, including building the development environment (Android SDK), development procedures, structures of applications, use of sensor API, screen organization, UI design, saving data, and network communications through a hands-on approach. Students write an Android application that acquires and then sends information from GPS and gravity sensors to a server. In addition, debugging methods are covered.

« Return to Special Introductory Course, Data Tools First top