COSC-3365 Distributed Databases Using Hadoop


Fred Kumi

Credit Spring 2023


Section(s)

COSC-3365-001 (55139)
LAB HLC ONL DIL

LEC W 4:30pm - 7:00pm HLC HLC1 2413

Course Description / Rationale

COURSE DESCRIPTION:

The goal of this course is to equip students with the basic understanding, knowledge, and practical skills to develop big data solutions with the data management tools, particularly those in the Hadoop ecosystem, with a focus on programming models including MapReduce, Hive, Pig, and Apache Spark.

This course assumes strong prior experience with Object-Oriented programming languages such as Java or C++,  preferably Java.  The Hadoop framework is implemented in Java, and we will be using and writing MapReduce applications in Java using the Eclipse or IntelliJ IDEA IDE.

  • Credit Hours:  3
  • Classroom Contact Hours per week:  2 hours 40 minutes
  • Laboratory Contact Hours per week:  50 minutes
  • Pre-requisites: 
    • Students should have been accepted in the Bachelor of Applied Science in Software Development (BAS).
    • Students should have previous familiarity with object-oriented language (such as C++ or Java, preferably Java).  All programming assignments must be completed in Java.
    • ITSE 1303 – Introduction to MySQL OR
    • ITSE 2309 – Database Programming: Oracle
    • COSC 1337 – Programming Fundamentals II OR
    • ITSE 2317 – Java Programming (Intermediate)
    • Prior exposure to Windows or any of the Linux operating system flavors


COURSE RATIONALE:

This course is designed to teach students introduction to Big Data, Hadoop and the Hadoop Ecosystem.  Topics that will be discussed include:

  • Hadoop Distributed File System (HDFS)
  • Yet Another Resource Negotiator (YARN)
  • MapReduce
  • No SQL Databases
  • Apache Hive
  • Apache HBase
  • Apache Spark
  • Apache Pig

Student Learning Outcomes/Learning Objectives

COURSE OBJECTIVES / LEARNING OUTCOMES:

  • To gain understanding of the Hadoop Fundamentals
  • To gain understanding Hadoop Distributed File System (HDFS)
  • To gain understanding of the MapReduce paradigm
  • To gain understanding of the Hadoop ecosystem

After successfully completing this course, a student should be able to:

  • Understand Big Data characteristics the Hadoop ecosystem
  • Understand how the Hadoop Distributed File System (HDFS) works
  • Install Java 8 and Eclipse on Windows 10
  • Install Hadoop on Windows 10
  • Develop and test MapReduce applications
  • Use MapReduce combiners, partitioners, and the distributed cache
  • Integrate other tools with Hadoop such as Apache Pig, Apache Hive, etc.


SCANS COMPETENCIES:

SCANS (Secretary’s Commission on Achieving Necessary Skills):

Refer to http://www.austincc.edu/cit/courses/scans.pdf for a complete definition and explanation of SCANS.  The following list summarizes the SCANS competencies addressed in this particular course:

RESOURCES

1.1 Manages Time

INTERPERSONAL

INFORMATION

3.1 Acquires and Evaluates Information

3.2 Organizes and Maintains Information

3.3 Uses Computers to Process Information

SYSTEMS

4.1 Understands Systems

4.2 Monitors and Corrects Performance

4.3 Improves and Designs Systems

TECHNOLOGY

5.1 Selects Technology

5.2 Applies Technology to Task

5.3 Maintains and Troubleshoots Technology

BASIC SKILLS

6.1 Reading

6.2 Writing

6.3 Arithmetic

6.4 Mathematics

6.5 Listening

THINKING SKILLS

7.1 Creative Thinking

7.2 Decision Making

7.3 Problem Solving

7.4 Mental Visualization

7.5 Knowing How to Learn

7.6 Reasoning

PERSONAL SKILLS

8.1 Responsibility

8.2 Self-Esteem

8.3 Sociability

8.4 Self-Management

8.5 Integrity/Honesty



 


Readings

Approved Course Text and Teaching Materials:

    BIG DATA TECHNOLOGIES FOR BUSINESS 1st Edition, Arben Asllani, 2021, Prospect Press, (ISBN-13: 978-1-491-90163-2)

Optional Course Text and Teaching Materials:

  1. Sams Teach Yourself Hadoop in 24 Hours 1st Edition, Jeffrey Aven, 2017, Pearson Education, Inc.  (ISBN-13: 978-0-672-33852-6)

Software: 

        Windows 10, Java 8, Apache Hadoop 3.3.1, and a Java IDE (like Eclipse)


Course Requirements

INSTRUCTIONAL METHODOLOGY:

This course will have 75% lecture and 25% laboratory.  The student will be required to do assigned readings from the text and handouts as well as scheduled individual labs to reinforce the material covered in class.  Scheduled tests will be used to assess the progress of the student toward achievement of the course objectives.
 

GRADING SYSTEM

Grade Policy: Grade will be assigned based both on concepts and practical application. exams, homeworks, and programming assignments will be a part of the grade. There are no extra credit assignments given in this course.

Examinations:  Two major exams (Midterm and Final) will be given during the semester.  If you miss an exam, a make-up exam will be given for excused absences only.  Contact the instructor before or immediately after the emergency that caused you to miss the exam.  Only exam one may be missed and eligible for make-up.  There will be NO make up for EXAM 2.  If you miss EXAM 2, you will receive a grade of zero (0).

Grading Criteria: Each student’s grade for this course consists of the following four parts:

            Semester Exams (2 @ 25% each)     50%
            Programming Assignments                25%
            Homework Assignments                      20%
            Individual Project                                     5%

   An overall grade will be assigned on the following grading scale:

            A – 90.00%  to 100%
            B – 80.00%  to  89.99%
            C – 70.00%  to  79.99%
            D – 60.00%  to  69.99%
            F – Below   59.99%


Course Policies

Attendance/Class Participation: Regular and timely class participation in discussions and laboratory attendance is expected of all students.  If attendance or compliance with other course policies is unsatisfactory, the instructor may withdraw students from the class.

In the event the college or campus closes due to unforeseen circumstances (for example, severe weather or other emergency), the student is responsible for communicating with their professor during the closure and completing any assignment or other activities designated by their professor as a result of class sessions being missed.

Students who do not come to class and do not contact the instructor during the first week of class will be classified as "Never Attended" and will be ineligible for financial aid and automatically withdrawn from the course.

Course Schedule: Please note that schedule changes may occur during the semester.  Any changes will be reflected in the schedule in Blackboard and will be accompanied by an email to all students.

Programming assignments:  Programming assignments must be the product of the student's independent effort.  Each programming assignment must be submitted on Blackboard on before the due date and time indicated on Blackboard and also in the programming assignment schedule. Programming assignments can be turned in up to seven days after the due date with a late penalty of 50% per week.  Scheduling of computer time outside of regular lab time is the student’s responsibility.

Homework assignments:  All homework assignments must be submitted on Blackboard on or before the due date and time indicated on Blackboard and also in the homework schedule.

Guidelines for Programming Assignments:

  1. Begin each lab assignment with an initial comment block that includes the following:  your name, instructor’s name, assignment number, assignment due date, course and section number, and name of SQL file.
  2. You will be graded on BOTH program accuracy and program style.
  3. Programming assignments must meet requirements, exactly as specified, and pass testing to receive full credit.
  4. All programming assignments must be submitted to Blackboard to receive credit.  No programming assignment will be accepted via e-mail.
  5. Programming assignment links are automatically/programmatically removed from Blackboard at 11:59pm Central Time seven days after the due date.  Therefore, it is not possible to submit programming assignments seven days after the due date.

Class and Lab Preparation:  All students are expected to read the chapters to be covered in class and familiarize themselves with the week’s assignments before class.  In this way, you will obtain much better value from the class, and can make best use of lab time.

Withdrawal:  It is the responsibility of each student to ensure that his or her name is removed from the roll should he or she decides to withdraw from the class.  The instructor does, however, reserve the right to drop a student should he or she feel it is necessary.  If a student decides to withdraw, he or she should also verify that the withdrawal is submitted before the Final Withdrawal Date.  The Final Withdrawal Date for this semester is Monday, April 24, 2023.  The student is also strongly encouraged to keep any paperwork in cases a problem arises.

Students are responsible for understanding the impact that withdrawal from a course may have on their financial aid, veterans’ benefits, and international student status.  Per state law, students enrolling for the first time in Fall 2007 or later at any public Texas college or university many not withdraw (receive a W) from more than six courses during their undergraduate college education.  Some exemptions for good cause could allow a student to withdraw from a course without having it count towards this limit.  Students are strongly encouraged to meet with an advisor when making decisions about course selection, course loads, and course withdrawals.

In situations where the student fails to withdraw before the withdrawal date, and the student's work is below the minimum acceptable standards, a letter grade of F will be given.

Incomplete Grade: A student may receive a temporary grade of “I” (Incomplete) at the end of the semester only if ALL of the following conditions are satisfied:

    1.  The student is unable to complete the course during the semester due to circumstances beyond their control.

    2.  The student has earned at least half of the grade points needed for a “C” by the end of the semester.

    3.  The student requests the grade in person at the instructor’s office and necessary documents are completed before the last day of the semester.

To remove an “I”, the student must complete the course by two weeks before the end of the following semester.  Failure to do so will result in the grade automatically reverting to an “F”.


College Policies

Statement on Academic Integrity: Austin Community College values academic integrity in the educational process.  Acts of academic dishonesty/misconduct undermine the learning process, present a disadvantage to students who earn credit honestly, and subvert the academic mission of the institution. The potential consequences of fraudulent credentials raise additional concerns for individuals and communities beyond campus who rely on institutions of higher learning to certify students' academic achievements, and expect to benefit from the claimed knowledge and skills of their graduates. Students must follow all instructions given by faculty or designated college representatives when taking examinations, placement assessments, tests, quizzes, and evaluations.  Actions constituting scholastic dishonesty include, but are not limited to, plagiarism, cheating, fabrication, collusion, falsifying documents, or the inappropriate use of the college’s information technology resources.  Further information is available at: https://www.austincc.edu/about-acc/academic-integrity-and-disciplinary-process.

The penalty for scholastic dishonesty for the course is a grade of ‘F’.

Student Rights & Responsibilities: Academic freedom is a foundation and hallmark of higher education.  In the context of college-level courses, it specifically refers to the rights of free expression and respect for others with differing opinions.  Students at the college have the rights accorded by the U.S. Constitution to freedom of speech, peaceful assembly, petition, and association. This concept is accompanied by an equally demanding concept of responsibility on the part of the student.  Just as you are expected to exercise these rights with respect for state and federal law in the larger world, you are expected to exercise these rights as a student with respect for the college’s standards of conduct.  These rights carry with them the responsibility to accord the same rights to others in the college community and not to interfere with or disrupt the educational process.  Students and faculty alike should enable a climate of mutual respect and civility while fostering the freedom to debate and discuss the merits of competing ideas.

Enrollment in the college indicates acceptance of the rules set forth in the student standards of conduct policy, which is administered through the office of the campus dean of student services. Due process, through an investigation and appeal process, is assured to any student involved in disciplinary action.

Student Complaints: A defined process applies to complaints about an instructor or other college employee. You are encouraged to discuss concerns and complaints with college personnel and should expect a timely and appropriate response. When possible, students should first address their concerns through informal conferences with those immediately involved; formal due process is available when informal resolution cannot be achieved.

Student complaints may include (but are not limited to) issues regarding classroom instruction, college services and offices on the basis of actual or perceived race, color, national origin, religion, age, gender, gender identity, sexual orientation, political affiliation, or disability.

Further information about the complaints process, including the form used to submit complaints, is available at: http://www.austincc.edu/students/students-rights-andresponsibilities/student-complaint-procedures

Statement on Privacy: The Family Educational Rights and Privacy Act (FERPA) protects confidentiality of students’ educational records. Grades cannot be provided by faculty over the phone, by e-mail, or to a fellow student.  Individual student grades are posted in BlackBoard.

Safety Statement: Health and safety are of paramount importance in classrooms, laboratories, and field activities. Students are expected to learn and comply with ACC environmental, health and safety procedures and agree to follow ACC safety policies. Emergency Procedures posters and Campus Safety Plans are posted in each classroom and should be reviewed at the beginning of each semester.

All incidents (injuries/illness/fire/property damage/near miss) should be immediately reported to the course instructor. Additional information about safety procedures and how to sign up to be notified in case of an emergency can be found at: http://www.austincc.edu/emergency

Everyone is expected to conduct themselves professionally with respect and courtesy to all. Anyone who thoughtlessly or intentionally jeopardizes the health or safety of another individual may be immediately dismissed from the day’s activity and will be referred to the Dean of Student Services for disciplinary action.

In the event of disruption of normal classroom activities due to an emergency situation or an illness outbreak, the format for this course may be modified to enable completion of the course. In that event, students will be provided an addendum to the class syllabus that will supersede the original version.

Tutoring: Free tutoring is provided for this course both online and face-to-face.  For online schedules and details please refer to https://sites.austincc.edu/cs/student-resources/csit-tutoring-schedule/

Campus Carry: The Austin Community College District concealed handgun policy ensures compliance with Section 411.2031 of the Texas Government Code (also known as the Campus Carry Law), while maintaining ACC’s commitment to provide a safe environment for its students, faculty, staff, and visitors. Beginning August 1, 2017, individuals who are licensed to carry (LTC) may do so on campus premises except in locations and at activities prohibited by state or federal law, or the college’s concealed handgun policy. In addition, concealed weapons are not allowed on ACC-sponsored field trips where the school owns or has chartered or leased vehicles for transportation. It is the responsibility of license holders to conceal their handguns at all times. Persons who see a handgun on campus are asked to contact the ACC Police Department by dialing 222 from a campus phone or 512-223-7999. Please refer to the concealed handgun policy online at: http://austincc.edu/campuscarry

Student Files – Privacy: Their instructor for educational and academic reasons may view the information that a student stores in his/her student volume in the Computer Studies Labs.

Discrimination Prohibited: The College seeks to maintain an educational environment free from any form of discrimination or harassment including but not limited to discrimination or harassment on the basis of race, color, national origin, religion, age, sex, gender, sexual orientation, gender identity, or disability.

Faculty at the College are required to report concern regarding sexual misconduct (including all forms o sexual harassment and sex and gender-based discrimination) to the Manager of Title IX/Title VI/ADA Compliance.  Licensed clinical counselors are available across the District and serve as confidential resources for students.

Additional information about Title VI, Title IX, and ADA compliance can be found in the ACC Compliance Resource Guide available at: https://drive.google.com/file/d/1o55xINAWNvTYgI-fs-JbDyuaMFDNvAjz/view

Use of ACC email: All College e-mail communication to students will be sent solely to the student’s ACCmail account, with the expectation that such communications will be read in a timely fashion. ACC will send important information and will notify students of any college- related emergencies using this account. Students should only expect to receive email communication from their instructor using this account.  Likewise, students should use their ACCmail account when communicating with instructors and staff.  Information about ACC email accounts, including instructions for accessing it, are available at: http://www.austincc.edu/help/accmail/questions-and-answers

Classroom Behavior: Students are expected to demonstrate proper classroom behavior.  The professor has the prerogative to request any student that demonstrates improper and disruptive classroom behavior to leave the classroom.  Improper and disruptive behavior includes, but is not limited to: profanity, verbal outbursts, unwarranted physical activity, and lack of respect for fellow students and/or the professor.

Emergency Situation: In the event of disruption of normal classroom activities due to an emergency situation or an illness outbreak, the format for this course may be modified to enable completion of the course.  In that event, students will be provided an addendum to the class syllabus that will supersede the original version.


Student Support Services

The success of our students is paramount, and ACC offers a variety of support services to help, as well as providing numerous opportunities for community engagement and personal growth.

Student Support: ACC strives to provide exemplary support to its students and offers a broad variety of opportunities and services. Information on these campus services and resources is available at http://www.austincc.edu/students.

Students Accessibility Services:  Each ACC campus offers support services for students with documented disabilities. Students with disabilities who need classroom, academic or other accommodations must request them through the office of Student Accessibility Services (SAS). Students are encouraged to request accommodations when they register for courses or at least three weeks before the start of the semester, otherwise the provision of accommodations may be delayed. Students who have received approval for accommodations from SAS for this course must provide the instructor with the ‘Notice of Approved Accommodations’ from SAS before accommodations will be provided. Arrangements for academic accommodations can only be made after the instructor receives the ‘Notice of Approved Accommodations’ from the student. Students with approved accommodations are encouraged to submit the ‘Notice of Approved Accommodations’ to the instructor at the beginning of the semester because a reasonable amount of time may be needed to prepare and arrange for the accommodations.

Additional information about Student Accessibility Services is available at https://www.austincc.edu/offices/student-accessibility-services-and-assistive-technology

Academic Support:  ACC offers academic support services on all of its campuses. These services, which include face-to-face and online tutoring, academic coaching, and supplemental instruction, are free to enrolled ACC students.  Tutors are available in a variety of subjects ranging from accounting to pharmacology. Students may receive these services on both a drop-in and referral basis. Tutoring schedules can be found at:  https://sites.austincc.edu/cs/student-resources/csit-tutoring-schedule/

Library Services:  ACC has a full-service library at each of its campuses to support ACC courses and programs and to provide students with research and assignment assistance from expert faculty librarians, computers, course reserves, laptop and tablet check out, study spaces, and copying, printing, and scanning services.  In addition, ACC students have full rights and privileges to access Library Services online 24/7 via the ACC Library website and students can use their ACCeID logins to access all online materials, including e-books, articles from library databases, and streaming videos. ACC Libraries also provide an “Ask a Librarian” service, which allows students to reach a librarian 24/7 through online chat.  Faculty librarians are also available via email, phone, and in person seven days a week during hours of operation. Visit:

In partnership with ACC’s Student Support Center, ACC Libraries also maintain a limited collection of textbooks for students to borrow. Priority access to the textbook collection is given to students receiving assistance. More information is available on the ACC website by searching “Student Support Center Textbook Collection.”

Student Organizations:  ACC has over seventy student organizations, offering a variety of cultural, academic, vocational, and social opportunities.  They provide a chance to meet with other students who have the same interests, engage in service-learning, participate in intramural sports, gain valuable field experience related to career goals, and much else.  Student Life coordinates many of these activities, and additional information is available at http://sites.austincc.edu/sl/

Personal Support:  Resources to support students are available at every campus. To learn more, ask your professor or visit the campus Support Center. All resources and services are free and confidential. Some examples include, among others:

  • Food resources including community pantries and bank drives can be found here: https://sites.austincc.edu/sl/programs/foodpantry/
  • Assistance with childcare or utility bills is available at any campus Support Center: http://www.austincc.edu/students/support-center.
  • The Student Emergency Fund can help with unexpected expenses that may cause you to withdraw from one or more classes: http://www.austincc.edu/SEF.
  • Help with budgeting for college and family life is available through the Student Money Management Office: http://sites.austincc.edu/money/.  A full listing of services for student parents is available at: https://www.austincc.edu/students/child-care
  • The CARES Act Student Aid will help eligible students pay expenses related to COVID-19:  http://www.austincc.edu/students/child-care/child-watch-drop-in-center

Mental health counseling services are available throughout the ACC Student Services District to address personal and or mental health concerns: http://www.austincc.edu/students/counseling

If you are struggling with a mental health or personal crisis, call one of the following numbers to connect with resources for help.  However, if you are afraid that you might hurt yourself or someone else, call 911 immediately.

Free Crisis Hotline Numbers:

  • Austin / Travis County 24 hour Crisis & Suicide hotline: 512-472-HELP (4357)
  • The Williamson County 24 hour Crisis hotline: 1-800-841-1255
  • Bastrop County Family Crisis Center hotline: 1-888-311-7755
  • Hays County 24 Hour Crisis Hotline: 1-877-466-0660
  • National Suicide Prevention Lifeline: 1-800-273-TALK (8255)
  • Crisis Text Line: Text “home” to 741741
    • Substance Abuse and Mental Health Services Administration (SAMHSA) National Helpline: 1-800-662-HELP (4357)
  • National Alliance on Mental Illness (NAMI) Helpline:1-800-950-NAMI (6264)


Illness:

Any ACC student or employee with symptoms or exposure to the COVID-19 virus should inform their professor(s) or supervisor and complete the college’s self-reporting form:
https://cm.maxient.com/reportingform.php?AustinCC&layout_id=124


Course Subjects

Week

Planned Lecture Topic

Big Data Technologies For Business

1

Course Overview; Lab Overview
Introduction to Big Data Analytics

Chapter 1: Big Data and Analytics

2

Introduction to Big Data Analytics
Cloud Computing and Big Data
Hadoop Installation

Chapter 1: Big Data and Analytics
Chapter 2: Cloud Computing and Big Data
N/A

3

Hadoop Distributed File System (HDFS)
Introduction to Shell Commands

Chapter 3: Distributed File Systems
Appendix A: Shell Commands in Linux and HDFS

4

Introduction To MapReduce
Anatomy of a Hadoop MapReduce Program

Chapter 4: Anatomy of MapReduce

5

Anatomy of a Hadoop MapReduce Program
MapReduce File Formats

Chapter 4: Anatomy of MapReduce

6

MapReduce Java APIs
Data Processing with Pig Latin

Chapter 4: Anatomy of MapReduce
Chapter 5: Apache Pig and Pig Latin

7

Data Processing with Pig Latin
Review for Exam 1

Chapter 5: Apache Pig and Pig Latin
Chapters 1 – 5
Appendix A

8

EXAM 1 - MIDTERM

 Chapters 1 – 5
 Appendix A

9

Data Processing with HiveQL
Importing and Exporting Data to and from the Cluster

Chapter 6: Apache Hive and HiveQL
Chapter 7: Moving Data with Sqoop

10

NoSQL Data Structures
Implementing Column-based Databases

Chapter 8: NoSQL Databases
Chapter 9: BigTable and HBase

11

Introduction to Spark

Chapter 10: Introduction to Spark

12

Processing Data with RDDs
Applications with Spark

Chapter 11: Resilient Distributed Datasets
Chapter 12: Applications with Spark

13

Applications with Spark
Performing Iterative Processing and Data Streaming

Chapter 12: Applications with Spark
Chapter 13: Iterative Processing and Data Streaming with Spark

14

Performing Iterative Processing and Data Streaming
Review for Final Exam

Chapter 13: Iterative Processing and Data Streaming with Spark
Chapters 6 – 13

15

FINAL EXAM

Chapters 6 – 13

16

Work on Capstone Project

Chapters 1 – 13
Appendix A


Office Hours


Published: 01/13/2023 18:27:42