# 计算系统概论A Introduction to Computing Systems (CS1002A.02) #### 计算机科学与技术学院 School of Computer Science and Technology ## **Outline** - 1 Course and Crew - What's the difference between "Big" and "Small"? - 3 Why Take This Course? - 4 Great Ideas in Computing Systems - 5 What's This Course All About? - 6 Summary ## **Outline** - 1 Course and Crew - What's the difference between "Big" and "Small"? - 3 Why Take This Course? - 4 Great Ideas in Computing Systems - 5 What's This Course All About? - 6 Summary ## 第一次开课: 2011年,暑期小学期 #### ■ 2011年,暑期小学期 Harbo Zhang, PhD Student in Computer Sciance at Penn - 全校160人选修,10级计算机英才班必修 - 主讲: Yale Patt, 中国科大客座教授, 安 虹数捋 - 助教: 8名研究生。 孙荪 (USTC Ph.D.) 华为), 张海博(PennSate Ph.D), 汤旭 龙(PennSate Ph.D), 魏学超(北大 Ph.D), 王涛, 吴石磊, 孙公瑾, 冷鹏 ## 第一次开课: 2011年,暑期小学期 ## 中国科大《计算机系统概论(H)》课程 ## 课程开设情况 ### ■ 2012年,秋季学期 - 计算机学院48人选修,11级计 算机英才班必修 - 主讲:安虹教授 - 助教: 张海博, 王涛 ## ■ 2013年,秋季学期 - 计算机学院47人选修,12级计 算机英才班必修 - 主讲:安虹教授 - 助教:张海博,程亦超,彭毅 #### ■ 2014年,秋季学期 - 计算机学院57人选修,13级计 算机英才班必修 - 主讲: 安虹教授 - 助教: 彭毅, 邱晓杰, 迟孟贤 ### ■ 2015年,秋季学期 - 计算机学院65人选修,14级计 算机英才班必修 - 主讲:安虹教授 - 助教: 迟孟贤, 金旭 ## ■ 2016年,秋季学期 - 计算机学院65人选修,15级计算 机英才班必修 - 主讲:安虹教授 - 助教: 苏志超, 冯诗影 #### ■ 2017年,秋季学期 - 计算机学院94人选修,16级计算 机英才班必修 - 主讲:安虹教授 - 助教: 苏志超, 冯诗影, 徐青青 #### ■ 2018年,秋季学期 - 全校110人选修,17级计算机英 才班必修 - 主讲:安虹教授 - 助教:武铮,金旭,邓静恒,张 子豫 #### ■ 2019年,秋季学期 - 全校83人选修,18级计算机英才 班必修 - 主讲:安虹教授 - 助教:张子豫,姜庆彩,许乐 ## 修课情况 ## 教学评估 ## 计算机学院第一门国家级一流本科课程(2020年) #### **Course crew: Instructors** | Instructor | Classroom | Course time | E-mail | Office | Phone | |---------------|----------------------|---------------|------------------|-----------------------------------------------------|-------------| | Hong An | 3A112,<br>CS1002A.01 | Monday 8,9,10 | han@ustc.edu.cn | Room 1409, Sci. & Lab<br>Building(west) | 13500507406 | | Hui Zhang | 3A112,<br>CS1002A.02 | Friday 3,4,5 | fzhh@ustc.edu.cn | Room 409,High Performance<br>Computing Center(east) | 13956969596 | | Fuyou<br>Miao | 3B102,<br>CS1002A.03 | Tuesday 3,4,5 | mfy@ustc.edu.cn | Room 517, Electronic Teaching<br>Building #3(west) | 13866166896 | 课程主页: http://acsa.ustc.edu.cn/ics/ ## 课程群 2025/2/24 #### **Course crew: TAs** | Name | E-mail | Office | Phone | |------|--------------------------------|--------------------------------------|-------------| | 金旭 | jinxu@mail.ustc.edu.cn | Room 1406, Sci. & Lab Building(west) | 13511610125 | | 石军 | shijun18@mail.ustc.edu.cn | Room 1411, Sci. & Lab Building(west) | 17756023365 | | 王朝晖 | wangzh95@mail.ustc.edu.cn | Room 1411, Sci. & Lab Building(west) | 15273133022 | | 钟书锐 | zsr1341864378@mail.ustc.edu.cn | Room 1411, Sci. & Lab Building(west) | 17873661361 | #### ■ Office hours and Discussion - 140+ students, Chaired by 4 TAs, fellow students - Fact to face help - See web page for times - http://acsa.ustc.edu.cn/ics/information.html ## **Outline** - 1 Course and Crew - What's the difference between "Big" and "Small"? - 3 Why Take This Course? - 4 Great Ideas in Computing Systems - 5 What's This Course All About? - 6 Summary ## **Questions** How powerful are today's computers? Why are they so powerful? # What can computers do? Is an abacus a computer? How to understand Turing's contribution to computers? ## How are they done? ■ What are the serious flaws in today's computers? What can't computers do? ## What is the definition of a computer? - A computer is a machine that can be programmed to carry out sequences of arithmetic or logical operations automatically. - Modern computers can perform generic sets of operations known as programs. - A broad range of industrial and consumer products use computers as control systems. - Simple special-purpose devices like microwave ovens and remote controls - Factory devices like industrial robots and computer-aided design - General-purpose devices like personal computers and mobile devices like smartphones. - Early computers were meant to be used only for calculations. - Simple manual instruments like the abacus have aided people in doing calculations since ancient times. - A modern computer consists of at least one processing element along with some type of computer memory. 2025/2/24 # Today, computer is in everything! 2025/2/24 # **Today, computer is in everything!** # **Today, computer is in everything!** # Vast infrastructure behind them: from the small to the big 2025/2/24 ## 你点击一次Google搜索按钮,结果是如何呈现给你的? #### ■ 所花费的时间 ● 不足1秒钟, 可以提供你所要找东西的1000+个链接 ## ■ 所访问的服务器 ● 1个搜索请求会发往数千台服务器,平均往返2400公里 ## ■ 所消耗的能量 ● 可以让一只100瓦的灯泡工作1小时 #### ■ 所使用的技术 ● 信息检索技术 如何找到最佳的信息匹配 ● 网络技术 如何用最新的网络技术传送信息 ● 信息采集技术 如何收集分散的各种信息资源 ● 硬件技术 如何为海量信息处理提供计算和存储资源 ● 并行处理技术 如何快速分类、检索和组织信息 ● 等等 ## 每时每刻都有200+项相关改进算法在实践 # Google全球数据中心 (2021年) 全球最高端数据中心-谷歌的数据中心是什么样的 ## "Big " vs. "Small" Computer Personal mobile devices/smart terminal devices #### Sma I I - **High performance** calculation - **High-performance** communications and I/O - **Power** consumption constraints - The volume constraint Supercomputing center (calculation)/data center (storage) 2025/2/24 #### Big - Capable of highperformance data processing more than demand - Reliability requirement # "Big " Computer Inside ■ Data center: a "warehouse" supercomputer Each data center covers an average of about 45,000 square meters and cost about \$600 million to build ## **Sunway Taihulight** ## - June 2016, national supercomputing wuxi center - 神威太湖之光书写自主创新传奇!运算速度每秒亿亿次 - 科技之光,神威太湖之光芯片竟如此强大,造价更是昂贵到这种程度 ## **Sunway Taihulight** ## - June 2016, national supercomputing wuxi center #### **Table 1: Sunway TaihuLight System Summary** | CPU | Shenwei-64 | | |-------------------------------|---------------------------------------------------------------------------------------|--| | Developer | NRCPC | | | Chip Fab | CPU vendor is the Shanghai High Performance IC<br>Design Center | | | Instruction set | Shenwei-64 Instruction Set (this is NOT related to the<br>DEC Alpha instruction set) | | | Node Processor cores | 256 CPEs (computing processing elements) plus 4 MPEs (management processing elements) | | | Node Peak<br>Performance | 3.06 TFlop/s | | | Clock Frequency | 1.45 GHz | | | Process Technology | N/A | | | Power | 15.371 MW (average for the HPL run) | | | Peak Performance of<br>system | 125.4 Pflop/s system in Wuxi | | | Targeted application | HPC | | | Nodes | 40,960 | | | Total memory | 1.31 PB | | | Cabinets | 40 | | | Nodes per cabinet | 1024 Nodes | | | Cores per node | 260 cores | | | Total system core count | 10,649,600 | | # How are "very large" computer systems built? -Sunway Taihulight #### ■ 运算节点板 ● 1CPU, 260计算核,系统基本构成单元,众核处理器+存储器 ## ■ 运算插件板 ● 4 运算节点板高密组装, 4CPU。运算节点+网络接口板 #### ■ 运算超节点 ● 64 运算插件板, 256CPU, 超节点内部采用紧耦合弹性互连 #### ■ 运算机仓 (Cabinet ) ● 4运算超节点, 256运算插件板, 1024运算节点(CPU) #### ■ 整机系统 ● 40运算机仓, 160超节点, 40960节点, 10649, 600计算核, 1 31PB # How are "very large" computer systems built? -Sunway Taihulight #### ■ 运算节点板 ● 1CPU, 260计算核,系统基本构成单元,众核处理器+存储器 ### ■ 运算插件板 ● 4 运算节点板高密组装, 4CPU。运算节点+网络接口板 #### ■ 运算超节点 ● 64 运算插件板, 256CPU, 超节点内部采用紧耦合弹性互连 #### ■ 运算机仓 (Cabinet ) ● 4运算超节点, 256运算插件板, 1024运算节点(CPU) #### ■ 整机系统 ● 40运算机仓, 160超节点, 40960节点, 10649, 600计算核, ## "Small " Computer ### ■ Personal mobile devices : Huawei P30 ## "Small" Computer Inside ## **Today's Dominant Target Systems** ## ■ Mobile (smartphone/tablet) - >1 billion sold/year - Market dominated by ARM-ISA-compatible generalpurpose processor in system-on-a-chip (SoC) - Plus sea of custom accelerators (radio, image, video, graphics, audio, motion, location, security, etc.) ## **■** Warehouse-Scale Computers (WSCs) - 10,000,000's cores per warehouse - Market dominated by x86-compatible server chips - Dedicated apps, plus cloud hosting of virtual machines - Now seeing increasing use of GPUs, FPGAs, custom hardware to accelerate workloads ## **■ Embedded Computing** - Wired/wireless network infrastructure, printers - Consumer TV/Music/Games/Automotive/Camera/MP3 - Internet of Things! ## **Outline** - 1 Course and Crew - What's the difference between "Big" and "Small"? - 3 Why Take This Course? - 4 Great Ideas in Computing Systems - 5 What's This Course All About? - 6 Summary ## Why Take This Course? - 掌握计算机系统(晶体管器件、数字逻辑、计算机组成原理、高级语言的编译与汇编、高级语言的硬件实现、操作系统)核心概念和思想的最小集 - 理解硬件和软件在计算机 系统中的作用和相互关系 - 理解如何构建完整的计算 机系统 - 塑造与众不同的程序员: 将算法设计与硬件设计融 会贯通 ## 枝繁叶茂 高性能 计算 智能 云计算 软件 数据库 工程 物联网 信息 图形 系 图像 结 ## 树大根深 ## Why Take This Course? #### **■** Foundational Goal Deeply understand **Intersects** all aspects of computing systems ## **■ Preparatory/Complementary** - Algorithm and data structures - Programming Language - Compilers and Interpreters - Operating Systems - Digital Systems Organization and Design - Mathematical Foundations of CS #### ■ Fun!!! • Who wouldn't want to understand the magic? ## **UCB EECS Bachelors Curriculum Overview** Calculus: Math 1A & 1B Multi-variable Calculus: Math A two course physics sequence: Physics 7A/7B, or Physics 5A/5B/5BL A natural science course with lab from{Physics, Astronomy, Biology, Chemistry, Earth and Planetary Science. Integrative Biology, Molecular and Cell Biology, Physics or Plant and Microbial Biology } 4 unit STEM elective from {Astro, Chem, Data Sci, EPS, IB, Math, MCB, Physics, PMB, Stat, or any Engin dept.} CS Bachelor of Arts EECS Bachelor of Science # CS 61B/BL (Data Structures) CS 61C (Machine Structures) Upper Division CS Requirements **Lower Division CS Requirements** Math 1A (Calculus I), Math 1B (Calculus II), Math54 CS 70 (Discrete Mathematics and Probability Theory) CS 61A (Structure and Interpretation of Computer Programs) 20 Upper Division Units\* 4 units of a Design Course from {CS 152, 160, 161, 162, 164, 169, 182, 184, 186/W186) or {EE C128, 130, 140, 143, 192, EECS C106A/C106B, 149 (formerly EE/CS 149), 151 {formerly CS 150/EE 141}} 8 units upper-division CS courses 8 units upper division CS/EE/EECS courses<sup>1 2</sup> 7 units of Upper Division Technical Electives: can be upper division CS/EE/EECS courses<sup>1 2</sup> #### **Lower Division EECS Requirements** CS 70 Discrete Mathematics and Probability Theory EE 16A Designing Information Devices and Systems I EE 16B Designing Information Devices and Systems II CS 61A (Structure and Interpretation of Computer Programs CS 61B/BL (Data Structures) CS 61C (Machine Structures) both) #### **Upper Division EECS Requirements** EE C106A, C106B, C128, 130, 140, 143, C149, 192 CS C149, 160, 162, 164, 169, 182, 184, 186, W186 EECS 149, 151 and 151LA (must take both), 151 and 151LB (must take CS 161 will fulfill the design requirements for students who took the class in Spring 2019 or later. #### Stanford CS Bachelors Curriculum Overview #### Math.&Sci. Mathematics (26 units minim Science (11 units minimum) Technology in Society (3-5 units Engineering Fundamentals (13 units #### Intro & Core CS 1C. Introduction to Computing at Stanford. 1 Unit. #### Theory - Mathematical Foundations of Computing (CS103) (5 units) - Introduction to Probability for Computer Scientists (CS109) (5 units) - Data Structures and Algorithms (CS161) (5 units) #### Systems - Programming Abstractions (CS106A,CS106 B or CS106X) - Principles of Computer Systems (CS110) (5 units) - Computer Organization and Systems (CS107)(5 units) #### Track Artificial Intelligence track(choose one) Biocomputation track (choose one) Computer Engineering track(choose one) Graphics track (choose one) Human-Computer Interaction track (choose one) Information track (choose one) Systems track (choose one) Theory track (choose one) Unspecialized track (choose one) *Individually Designed* track #### MIT EECS Bachelors Curriculum Overview #### 6-2. Electrical Engineering and Computer Science #### 6-3. Computer Science and **Engineering** 6-7. Computer Science and **Molecular Biology** Intro: 1 of {6.01,6.02,6.03,6.508} Prog. Skills: 1 of {6.0001,6.S080} Math: 1 of {18.03,2.087} Foundation: 3 of {6.002.6.003.6.004.6.006.6 .007,6.008,6.009} Header: 3 of {6.011,6.012, ,6.013,6.014, 6.021,6.031,6.033, 6.034, 6.036, 6.045, 6.046} Other: 1 of {6.UAT,6.UAR}, AUS2, AUS2, EECS, EECS Semester 1: Programming skills, Differential Equations Semester 2: Introduction to EECS. Foundation #1 Semester 3: Foundation #2 Foundation #3 Semester 4: Header #1, Header #2 Semester 5: Header #3: AUS #1 Semester 6; AUS #2. Course 6 Elective #1. Intro: 1 of {6.01,6.02,6.03,6.508 Prog. Skills: 1 of {6.0001.6.S080} Math: 1 of 6.042 Foundation: 3 of {6.004,6.009,6.006} Header: 6.031.6.033. 1 of {6.045,6.046}, 1 of {6.034, 6.036} Other: 1 of {6.UAT,6.UAR}, AUS2.AUS2.EECS Semester 1: Programming skills. Discrete math Semester 2: Introduction to EECS, Foundation #1 6 UAT or 6 UAR and the Course 6 elective Semester 3: Foundation #2, Foundation #3 Semester 4: Header #1, Header #2 Semester 5: Haader #3. Header #4 Semester 6: AUS #1. AUS #2 Intro: (6.009 and 6.031).6.042J [Note 4] Intro Lab: 1 of {7.02J, 20.109. 20.129 [Note 14]} Math Requirement(6.00 [Note 131 and 6.009) Foundation Bio: 7.03, 7.05 Foundation CS: 6.006 Foundation Bio: 7.06 Foundation CS: 6.0046J **Bio and Computational Bio** Restricted Electives: 2 of { 1.S993, 6.047, 6.049, 6.503, 6.802J, 7.09, 18.418} Other: 1 of {6.UAT,6.UAR} Semester 3: Foundation #2: Foundation #3: Prog. Skills: 1 of {6.0001, 6.0002, 6.009} Math: 18.06,6.041, 6.042. 14.30 or 18.600 Foundation FDS: 1 of {14.01.14.03}, 14.32 Foundation CS:6.006, 6.036 Foundation EDS: 1 of {14.05.14.18, 14.33}, 1 of {6.207.6.207. 15.053}. Foundation CS: 6.046 Data Science Elective: 1 **Economics Theory Elective: 1 Data Science or Theory** Elective: 1 Other: 1 of {6.UAT.6.UAR.15.276} Semester 2: Probability and Statistics, Programming #2 // 6,009), Microeconomics Semester 1: Programming skills. Discrete math Semester 1: Linear Algebra: Discrete Math. Programming #1 + #2 if 6 00025 Semester 1: Linear Algebra, Discrete Math. Programming #1 + #2 if 6 00025 Semester 2: Introduction to EECS. Foundation #1 Semester 3: Algorithms, Econometrics 4 = 5 4 Semester 4: Honder #1. Header #2 Semester 4: Machine Learning, Advanced Algorithms Semester 5: Header #3 Header #4 Semester 5: Intermediate Economico, Nebechio and Optimization, Elective #1 Semester 6: AUS #1, AUS #2 6 UAT or 6 UAB Semester 6: Elective #2 Elective #3 For 6 UAR and the second Course 6 e 4 D > 4 A > #### **MIT EECS Bachelors Curriculum Overview** students must choose headers and foundations to span the breadth of EE, EECS, and CS1 <sup>1</sup> of the headers and foundations, two must be from EE, two from CS, and one from EECS 2 6.045 and 6.046 also require 6.042, either as a direct pre-reg or as a pre-reg to 6.006 #### **CMU CS Bachelors Curriculum Overview** 2025/2/24 #### USTC CS Bachelors Curriculum Overview 2025/2/24 イロト イ御 ト イミト イミト #### **Textbooks in CS 61C at UCB** - Great Ideas in Computer Architecture (Machine Structures) - 30+ TAs - https://cs61c.org/fa21/ - Computer Organization and Design RISC-V Edition, 1st ed. by David Patterson, and John Hennessy - The C Programming Language, 2nd ed. by Brian Kernighan and Dennis Ritchie - The Datacenter as a Computer by Luiz André Barroso and Urs Hölzle, freely available here #### 优化课程内容 Introduction to Computing Systems: From Bits & Gates to C/C++ & Beyond Yale N. Patt and Sanjay J. Patel, June 2003, McGraw-Hill Higher Education Computer Organization and Design: The Hardware/Softwa re Interface, David A Patterson, John L. Hennessy, 5th edition. Morgan Kaufmann Publishers, Inc., 2017 ## 计算机系统概论 Introduction to Computing Systems ( CS1002A.02 ) #### 计算机科学与技术学院 School of Computer Science and Technology #### **Outline** - 1 Course and Crew - What's the difference between "Big" and "Small"? - 3 Why Take This Course? - 4 Great Ideas in Computing Systems - 5 What's This Course All About? - 6 Summary ## 课程目标: 理解计算机系统结构的重要思想 计算思维能力 (算法) 程序设计能力 (编程) 系统设计能力 (结构) ### 教育理念: 计算思维的发展历程(内涵与外延) "在异构计算的时代程序员必须对于算法和硬件模型融汇贯通,才能写出高质量的代码。因此,未来的程序员也必须懂硬件!" —图灵奖得主David Patterson ## Computational thinking -Jeannette M. Wing - Computational thinking will be a fundamental skill used by everyone in the world by the middle of the 21st Century. - Just like reading, writing, and arithmetic. - Incestuous: Computing and computers will enable the spread of computational thinking. - In research: scientists, engineers, ..., historians, artists - In education: K-12 students and teachers, undergrads, ... - Computational Thinking is the thought processes involved in formulating a problem and expressing its solution in a way that a computer—human or machine—can effectively carry out. - Computational Thinking is what comes before any computing technology—thought of by a human, knowing full well the power of automation. #### ACM和IEEE-CS计算机类专业课程体系规范CC2020 - CC2020采用胜任力模型,融合知识、技能、品行三个方面的综合能力培养。CC2020将对中国计算机专业设置带来积极而深远的影响。 - CC2020采用"计算"(computing)一词作为计算机工程、计算机科学和信息技术等所有计算机领域的统一术语;同时采用"胜任力"(competency)一词来代表所有计算教育项目的基本主导思想。其目标就是从知识(knowledge)、技能(skills)和品行(dispositions)三方面培养,使学生胜任未来计算相关工作内容。 - 知识对应胜任力的"了解"(know-what)维度 - 技能表达了知识的应用,是胜任力的"诀窍"(know-how)维度 - 品行构成胜任力的 "知道为什么" (know-why)维度 ### 什么是计算思维能力? 计算思维: 利用包括网络在内的计算系统进行问题求解 (<mark>自然问题、社会问题、技术问题</mark>) 的思维方式 社会 #### 什么是计算思维能力? #### 不同的历史时期,人类关注的重大问题不同 - 20世纪:战争、人口增长、饥饿等 21世纪: 自然资源消耗过快、环境污染、气候异常、健康医疗、人口 老龄化、贫富差距过大、城市交通、非传统安全问题(非典型传染病 爆发、金融危机、恐怖主义、网络攻击等) #### 科学与工程计算 蛋白质与药物设计 #### 大数据, 人工智能和云计算 ## How to use the Sunway Taihulight to solve complex problems? ■ 神威太湖之光荣获戈登贝尔奖,中国超算实现零的突破 # **Great Ideas in Computing Systems in this courses** - Great Idea #0: Great Idea from Ancient Chinese Philosophy(Bits and Bytes) - Great Idea #1: Computer is an Universal Computing Device(Turing Machine Model) - Great Idea #2: Stored program computer(Von Neumann Model) - Great Idea #3: Abstraction Helps Us Manage Complexity(Layers of Representation/Interpretation) - Great Idea #4: Software and Hardware Co-design # **Great Ideas in Computing Systems in other courses** - Great Idea #5: Computer Family (IBM 360) - Great Idea #6: Principle of Locality (Memory Hierarchy) - Great Idea #7: Make the Common Case Fast - Great Idea #8: RISC vs. CISC - Great Idea #9: Moore's Law (Designing through trends) - Great Idea #10: Parallelism & Amdahl's law (which limits it) - Great Idea #11: Dependability via Redundancy ## **Great Idea from Ancient Chinese Philosophy** ## Everything comes from being and being comes from nothing. #### 天下万物生于有,有生于无 《老子•四十章》 ## 《易经》 太极生两仪, 两仪生四象, 四象生八卦, 八卦演万物。 ## **Great Idea from Ancient Chinese Philosophy** #### **■** Binary Number - The modern binary number system was studied in Europe in the 16th and 17th centuries by Thomas Harriot, Juan Caramuel y Lobkowitz, and Gottfried Leibniz. - Leibniz was specifically inspired by the Chinese "I Ching". 88 MEMOIRES DE L'ACADEMIE ROYALE res Lineaires qu'on lui attribue. Elles reviennent toutes à cette Arithmétique; mais il fuifit de mettre lei La Figure de Bast Ceva comme on l'appelle, qui paffe pour fondamentale, & d'y joindre l'explication qui est manifelte, pourvi qu'on remarque premiserement qu'une ligne entire — fignifie l'unit ou 1, & fecondement qu'une ligne brifée — fignifie se zero ou o. Les Chinois ont perdu la signification des Cous ou Linéations de Fohy, peut-être depuis plus d'un millenaire d'année; & ils ont fait des Commentaires là-deffus, où ils ont cherché je ne sçai quels sens éloignés. De forte qu'il a fallu que la vraie explication leur vint maintenant des Européens : voici comment. Il n'y a gueres plus de deux ans que j'envoyai au R. P. Bouver Jéluite, François célébre, qui demeure à Pekin, ma maniere de compter par o & 1; & il n'en fallut pas davantage pour lui faire reconnoître que c'est la cles des Figures de Fohy. Ainsi m'écrivant le 14 Novembre 1701, il m'a envoyé la grande Figure de ce Prince Philosophe qui va à 64, & ne laisse plus lieu de douter de la vérité de notre interprétation; de forte qu'on peut dire que ce Pere a déchiffré l'Enigme de Fohy à l'aide de ce que je lui avois communiqué. Et comme ces Figures font pess-être le plus ancien monument de science qui soit au monde, cette reflitution de leur fens, après un si grand intervalle de tems, paroitra d'autant plus curicufe. Le consentement des Figures de Foly & de ma Table des Nombres, se fait mieux voir lorsque dans la Table on supplée les zeros initiaux, qui paroifient superflus, mais qui servent âlmieux marquer la période de la colon- イロト イ御ト イミト イミト nc, ## **Great Idea from Ancient Chinese Philosophy** Is an abacus a computer? Computer Organization and Design: The Hardware/Softwar e Interface, David A Patterson, John L. Hennessy, 5th edition. Morgan Kaufmann Publishers, Inc., 2017 ## A computing tool that does not use electricity #### ■ Abacus (公元前500年,中国) Abacus China c. 1970 Loan of Gwen and Gordon Bell, B1643.01 Table abacus (reproduction) and jetons Germany 17th century Loan of Michael R. Williams, L2003.3.2 Soroban Japan c. 1960 Loan of Gw Bell, B1655.v. Counting Frame Early 20th century Gift of Gwen and Gordon Bell, B141.80 Schoty Russia Early 20th century Gift of Warren Yogi, 102 ## A computing tool that does not use electricity #### **■** Sectors ## A computing tool that does not use electricity #### ■ Slide Rules Slide rule US c. 1956 Gift of Lynn Yarbrough, X121.82 Pickett circular slide rule Japan c. 1955 Loan of Gwen and Gordon Bell, B1657.01 Mannheim slide rule France c. 1860 Loan of Gwen and Gord B203.82 Fuller's Rule US 1921 Gift of University of Illinois, X250.83A Lord's calculator England c. 1900 Loan of Gwen and Gordon Bell, B123.80 ## **Automatic computing equipment: from** mechanical computer to electronic computer Charles Babbage, 1791 - 1871, England Alan Turing(24) Eckert(24) and Mauchly(36) 1832,2002,2008 The Babbage Difference Engine, 17 years, 25,000 parts, 5ton, cost: £17,470 Turing Machine, 1936 ENPAC 1946 #### Charles Babbage (1791-1871): A Fallen Hero! [Copyright expired and in public domain. Image obtained from Wikimedia Commons.] - Lucasian Professor of Mathematics, Cambridge University, 1828-1839 - A true "polymath" with interests in many areas - Frustrated by errors in printed tables, wanted to build machines to evaluate and print accurate tables - Inspired by earlier work organizing human "computers" to methodically calculate tables by hand # Babbage difference engine: the first mechanical computer (1832) - 2002,2008 - The Babbage - Difference Engine - 17 years, - 25,000 parts, 5ton - cost: £17,470 ## **Turing Machine** - Mathematical model of a device that can perform any computation Alan Turing (1937) - •ability to read/write symbols on an infinite "tape" - state transitions, based on current state and symbol - ■Every computation can be performed by some Turing machine. (Turing' s thesis) Turing machine that multiplies ## **Universal Turing Machine** - Turing described a Turing machine that could implementall other Turing machines. - inputs:data,plus a description of computation (Turing machine) Universal Turing Machine - U is programmable so is a computer! - instructions are part of the input data - a computer can emulate a Universal Turing Machine, and vice versa - Therefore, a computer is a universal computing device! #### Bombe (1939) The bombe is an electromechanical device used by the British cryptologists to help decipher German Enigmamachine-encrypted secret messages during World War II. The initial design of the British bombe was produced in 1939 at the UK Government Code and Cypher School (GC&CS) at Bletchley Park by Alan Turing. Enigma machine A wartime picture of a **Bletchley Park** Bombe # Colossus- The world's first programmable, electronic, digital computer (1943-1945) - Colossus was a set of computers developed by British codebreakers in the years 1943–1945[1] to help in the cryptanalysis of the Lorenz cipher. - Colossus used thermionic valves (vacuum tubes) to perform Boolean and counting operations. - Colossus is regarded as the world's first programmable, electronic, digital computer, although it was programmed by switches and plugs and not by a stored program. - Office (GPO) research telephone engineer Tommy Flowers at Bletchley Park. Alan Turing's use of probability in cryptanalysis contributed to its design. - Turing's machine that helped decode Enigma was the electromechanical Bombe, not Colossus. A Colossus Mark 2 computer being operated by Wrens. ### **ENIAC** - The first electronic computer ,1946 1904, The world's first electron tube was born at the hands of the British physicist Fleming #### ENIAC(Electrical Numerical Integrator And Calculator) - 17,468 vacuum tubes - Power 150kW - Weighed 30 tons - Occupied 1800 sq ft - 8o feet long - 8.5 feet high - Clock: 100kHz, About 5000 additions per second - RAM: ~230bytes, Could store 20 numbers Could store 20 numbers in main memory - IO: punched card - Cost about \$500,000 2025/2/24 #### **ENIAC (1946)** - First electronic general-purpose computer - Construction started in secret at UPenn Moore School of Electrical Engineering during WWII to calculate firing tables for US Army, designed by Eckert and Mauchly - Twelve 10-decimal-digit accumulators - Had a conditional branch! - Programmed by plugboard and switches, time consuming! - Purely electronic instruction fetch and execution, so fast - 10-digit x 10-digit multiply in 2.8ms (2000x faster than Mark-1) - As a result of speed, it was almost entirely I/O bound - As a result of large number of tubes, it was often broken (5 days was longest time between failures) #### **Great Idea #1:** ## **Computer is an Universal Computing Device** All computers, given enough time and memory, are capable of computing exactly the same things. Supercomputers ## How is ubiquitous computing done? During the four years of undergraduate study, which computer courses are relatively more basic and must be mastered? ## From Theory to Practice - In theory, computer can *compute* anything that's possible to compute - given enough memory and time - In practice, solving problems involves computing under constraints. - time - weather forecast, next frame of animation, ... - power - cell phone, handheld video game, ... - cost - cell phone, automotive engine controller, ... #### **ENIAC** Changing the program could take days or weeks! ## Great Idea #2 Von Neumann Model(存储程序计算机) Von Neumann Vincent Wilkes ## 我们的祖上"挺阔的" - 申国科大计算机专业创建于 1958年, 当时隶属于应用数学和 计算技术系 - 应用数学和计算技术系首任系 主任: 华罗庚 - 计算机专业首任教研室主任: 夏培肃 - 中国第一个计算机三人小组 - 1952年,华罗庚教授会见了三位 年轻科学家,讨论的是一个前沿 话题:研制中国的计算机。由此 中国第一个计算机三人小组成立 。他们是夏培肃、闵乃大,王传 英。 ## 我校首台计算机: 107计算机 - ■夏培肃先生主持研制 - ■中国第一台自主设计的通 用电子计算机 - ■中国第一台自主设计的冯• 诺依曼结构计算机 - EDVAC, 美国, 1945-1952 - EDSAC, 英国, 1945-1949 - 107机,中国,1953-1959 - ■1960年在中国科大投入使用(命名为KD-1) - ■1970年随科大下迁至合肥 - ■1974年被拆除 ## 首批教师:参与107机研制的教师 夏培肃院士(左六)与我校计算机学科早期建设者、107机研制者、原计算机系统结构教研室主任郑世荣教授、钟津立、周行仁、赵鼎文、杨学良、王武良等在玉泉路校园合影。 ## 我校首套计算机教材 ■夏培肃先生主持编写的我校 第一套《计算机原理》教材 。也是国内最早的《计算机 原理》教材。 ## The USTCers' outstanding contributions to the Chinese computer - 1959年, 107计算机 - 中国第一台自主设计的通用 计算机 - ■主设计师: 夏培肃 - 2002年, 龙芯1号 - 中国第一颗自主设计的通用 微处理器芯片 - **主设计师: 胡伟武(86本)** #### The invention of the transistor ## Bell Labs lays the groundwork: - 1945: Bell sets up lab in the hopes of developing "solid state" components to replace existing electromechanical systems. William Schockley, John Bardeen, Walter Brattain: all solid-state physicists. Focus on Si and Ge. - 1951: Shockley develops junction transistor which can be manufactured in quantity. - 1954: The first transistor radio! Also, TI makes first silicon transistor (price \$2.50) - 1956: Bardeen, Shockley, Brattain receive Nobel Prize. ## Two major inventions of the microprocessor chip Stored program + Transistor technology Change the program so that you can do all kinds of tasks on the same hardware The device is smaller and faster than a vacuum tube ## First computer vs. First microprocessor chip After 25 years # 1946, ENIAC(Electrical Numerical Integrator And Calculator) - 18000 vacuum tubes - 1500 relays - 174 KW - 30 tons - 1800 sq. ft. footprint - Clock: 100kHz - RAM: ~230bytes - IO: punched card #### 1971, Intel 4004 - 10 micron process, NMOS-Only Logic - 2,300 transistors - 3x4 mm die - 4-bit bus - Performance < 0.1 MIPS - 640 bytes of addressable Memory - 750 KHz ## Thirty years after the first microprocessor chip was born After 30 years #### 1971, Intel 4004 - 10 micron process - 2,300 transistors - 3x4 mm die - 4-bit bus - 640 bytes of addressable Memory - 750 KHz 2000, Intel Pentium IV - Issues up to 5 uOPs per cycle - MMX, SSE, and SSE2 - 0.18 micron process - 42 million transistors - 217 mm die - 64-bit bus - 8KB D-cache, 12KB op trace cache (I-cache), 256KB L2 cache - 1.4 GHz Pentium III (24000K) Pentium IV (42000K) #### **Performance improved 5000x:** smaller, faster, cheaper 2025/2/24 486 (1180K) **Pentium** (3100K) Pentium II (7500K) ## 人类如何实现从物理设备到问题求解的? ## **Great Idea #3: Abstraction helps us Manage Complexity** 从广义上讲,<mark>计算机系统结构</mark>是抽象层次的设计,它允许我们 使用可用的制造技术有效地实现信息处理应用程序。 ~42<sub>~</sub> ## **Related Courses in UC Berkeley** ## **EECS151 Courses in UC Berkeley** ### Digital design is not a spectator sport! Learn by doing. #### Deep Digital Design Experience - Fundamentals of Boolean Logic - Synchronous Circuits - Finite State Machines - Timing & Clocking - Device Technology & Implications - Controller Design - Arithmetic Units - Memories - Testing, Debugging - Hardware Architecture - Hardware Design Language (HDL) - Design Flow (CAD) ## **Abstraction to Simplify Hardware Design** How do we put the devices into system? ## **Abstraction to Simplify System Design** ## **Abstraction to Simplify System Design** ## How do we get the electrons to do the work? 2025/2/24 ## **Great Idea #4: Software and Hardware Codesign** #### **Old Machine Structures** ■ Mainframe: IBM System/360 #### **Old Machine Structures** 2025/2/24 #### **New Machine Structures** #### **■ IBM Blue Gene** ## **New Machine Structures:** From the Gate to the Cloud #### Software ■ Parallel Requests Assigned to computer e.g., Search "Katz" #### Parallel Threads Leverage Parallelism & Assigned to core e.g., Lookup, Ads Achieve High Performance Hardware ## ■ Parallel Instructions >1 instruction @ one time e.g., 5 pipelined instructions #### ■ Parallel Data >1 data item @ one time e.g., Add of 4 pairs of words ### ■ Hardware **Descriptions** All gates functioning in parallel at same time **Programming** Languages ## A Computer Architecture in a broad sense #### **ISA** and ABI - Interface at the top - ABI (Application Binary Interface,应用二进制代码接口) - ISA (Instruction Set Architecture, 指令集体系结构) - C.P. API (Application Program Interface, 应用程序接口) - ISA separates hardware from rest - ABI separates processes from rest ## 人工智能无处不在 #### **Supercomputers** Business analyti AlphaGo Ad prediction Drug design #### **Data Centers** Automatic translation #### **Smartphones** Audio recognition Image analysis #### **Embedded Devices** Robotics Consumer electronics ## 新应用如何影响软硬件层次的设计? Algorithm and Data Structure Programming Language/Compiler **Operating System/Virtual Machines** Instruction Set Architecture (ISA) Microarchitecture Gates/Register-Transfer Level (RTL) Analog/Digital Circuits **Electronic Devices** **Physics** But do we really need to charge them? Can they work without battery? ## 如何设计新结构去应对层出不穷的新应用? ## 计算机系统概论 Introduction to Computing Systems ( CS1002A.02 ) #### 计算机科学与技术学院 School of Computer Science and Technology ## **Outline** - 1 Course and Crew - What's the difference between "Big" and "Small"? - 3 Why Take This Course? - 4 Great Ideas in Computing Systems - 5 What's This Course All About? - 6 Summary #### **Text Book** #### Introduction to computer architecture(ISA) - How is data represented? - What are the pieces of a computer? - How do computers work? #### **■** Programming - How do I "talk" directly to the machine?-Assembly language - How do I program in "C"?high level language(HLL) ## Computer systems and computation • How do simple HW/SW elements come together to realize complex computations? Introduction to Computing Systems: from bits and gates to C/C++and beyond(3nd edition), Yale N. Patt and Sanjay J. Patel , September 2019, McGraw-Hill Higher Education ## **Text Book Components** #### ■ Part 1: Hardware(Chapter 1-4) - Representing data, transistors, gates, digital logic structures - von Neumann machine model ### ■ Part 2: Software: Assembly language(Chapter 5-10) - Instructions, (structured) programming, input/output, relationship to hardware - Part 3: Software: C programming(Chapter11-19), selected - Syntax, operators, control structures, functions, pointers, recursion, data structures, relationship to assembly language - Assume already familiar with programming (C) #### This Course Focus on - Chapt 2 Bits, Data Types, and Operations - How do we represent information using electrical signals? - Chapt 3 Digital Logic Structures - How do we build circuits to process information? - Chapt 4, 5,6 Computer Machine Model, Processor and Instruction Set - How do we build a processor out of logic elements? - What operations (instructions) will we implement? - Chapt 7 Assembly Language Programming - How do we use processor instructions to implement algorithms? - How do we write modular, reusable code? (subroutines) - Chapt 8 Data Structures - Chapt 9 I/O, Traps, and Interrupts - How does a processor communicate with outside world? - Chapt 11, C Implementation related to hardware ### **Other Reference Text Books** THE ONCE AND FUTURE TURING GOMPUTING THE WORLD COMPUTING THE WORLD COMPUTING STREET AND ANY GES Introduction to Computing Systems: from bits and gates to C/C++ and beyond(3nd edition), Yale N. Patt and Sanjay J. Patel, June 2019, McGraw-Hill Higher Education The Once and Future Turing: Computing the World (1st Edition) , S. Barry Cooper, Andrew Hodges, Cambridge University Press, 2016 Computer Science Illuminated(6th Edition) , D. M. Harris, S. L. Harris, Morgan Kaufmann, San Francisco, 2016 Computer Systems: A Programmer 's Perspective (3rd Edition), Randal E. Bryant, David R. O' Hallaron, Pearson Education Inc., 2016 ### Other Reference Text Books Digital Design and Computer Architecture (2nd Edition) . **David Harris Sarah** Harris, p712, Morgan Kaufmann, 24th July 2012 6 **Digital Integrated** Circuits: A Design Perspective (2nd Edition), Jan M. Rabaev, Anantha Chandrakasan, Borivo je, Prentice-Hall, Inc, Nikolic, Jan 3, 2003 Computer Organization and Desian: The Hardware/Softwar e Interface. David A Patterson. John L. Hennessy, 5th edition. Morgan Kaufmann Publishers, Inc., Computer Architecture: A Quantitative Approach, John L. Hennessy and David A Patterson, The Morgan Kaufmann , Dec 7, 2017 ## **Background/Prerequisites** ### **■** Requirement - background in programming(C) - Assume you can program/debug in C # **Architecture continually changing** Applications suggest how to improve technology, provide revenue to fund development ◆□▶ ◆圖▶ ◆團▶ ◆團▶ ■ ### **Abstraction Layers in Modern Systems** #### **Application** Algorithm and Data Structure Programming Language/Compiler **Operating System/Virtual Machines** **Instruction Set Architecture (ISA)** Microarchitecture Gates/Register-Transfer Level (RTL) Analog/Digital Circuits **Electronic Devices** **Physics** # **Transformations Between Layers** ## How do we solve a problem using a computer? A systematic sequence of transformations between layers of abstraction. # **Deeper and Deeper...** # **Descriptions of Each Level** - Problem Statement - stated using "natural language" - may be ambiguous, imprecise - Algorithm - step-by-step procedure, guaranteed to finish - definiteness, effective computability, finiteness - **■** Program - express the algorithm using a computer language - high-level language, low-level language - Instruction Set Architecture (ISA) - specifies the set of instructions the computer can perform - data types, addressing mode 2025/2/24 # **Descriptions of Each Level (cont.)** - **■**Microarchitecture - detailed organization of a processor implementation - •different implementations of a single ISA - **■Logic Circuits** - •combine basic operations to realize microarchitecture - many different ways to implement a single function (e.g., addition) - **■**Devices - •properties of materials, manufacturability # Many Choices at Each Level # **Course Objectives** - Understand role & relationship of hardware and software - Exposure to. . . - Machine organization - Assembly language programming - C programming - Understand how to build entire (slow) computing system - Hardware and software - You' Il get a chance in complementary courses - Be distinguished from mere programmers 计算机系统(晶体管器件、数字逻辑、组成原理、高级语言的编译与 汇编、高级语言的硬件实现、操作系统)核心概念和思想的最小集 2025/2/24 # **Course Objectives** # "Any sufficiently advanced technology is indistinguishable from magic." Arthur C. Clarke, "Profiles of The Future" (Clarke's 3rd law) - No magic: Computers should not be magic to computer scientists! - Bottom UP: Start with what they "know" - Computing systems from transistors on up - The transistor as light switch - Not quantum mechanics - Choose a computer model that is simple - Not about "design", but about "insight" into all computers - As the genius said: simple, but still rich - Continually build on what you know - Continually raising the level of abstraction - Memorizing as little as absolutely necessary - Trying very hard to not introduce magic You take, You enjoy!!! #### **Courses Contents** - 1. Overview - 12. Transition to C - 13. Programming in C - 8. Programming and Debugging - 11. Subroutines, Calls, traps, interrupts - 10. Physical I/O - 9. Assembly Language programming - 2. Operations on bits, bytes (arithmetic, logical) - 7. The LC-3 Instruction set architecture - 6. The Von Neumann model - 5. The finite state machine - A Digital Logic - 4. Digital Logic - 3. The transistor 2025/2/24 # Computational model (Turing machine) vs. Structural model (Von structure) 2025/2/24 # Computational model (Turing machine) vs. Structural model (Von structure) # **Instruction Set Architecture (ISA) vs. Finite State Machine** Igure A.2 Format of the entire LC-3 instruction set. Note: + indicates instructions that modify condition codes # **Instruction Set Architecture (ISA) vs. Finite State Machine** # **Computer microarchitecture vs. Finite State Machine** \_23\_ #### **Levels of Abstraction** Algorithm and Data Structure Programming Language/Compiler **Operating System/Virtual Machines** Instruction Set Architecture (ISA) Microarchitecture Gates/Register-Transfer Level (RTL) **Analog/Digital Circuits** **Electronic Devices** **Physics** ### **N-type MOS Transistor** #### ■MOS = Metal Oxide Semiconductor • two types: N-type and P-type #### **■**N-type when Gate has positive voltage, short circuit between #1 and #2 (switch closed) when Gate has <u>zero</u> voltage, open circuit between #1 and #2 (switch <u>open</u>) Terminal #2 must be connected to GND (0V). ## P-type MOS Transistor #### ■P-type is *complementary* to N-type when Gate has positive voltage, open circuit between #1 and #2 (switch open) when Gate has <u>zero</u> voltage, short circuit between #1 and #2 (switch <u>closed</u>) Terminal #1 must be connected to +2.9V. #### Levels of Abstraction # **Inverter (NOT Gate)** #### **OR Gate** #### **AND Gate** ## **Gates** #### **Levels of Abstraction** Multiarid # **Basic Logical Structure** 2025/2/24 Lecture 1 #### **Levels of Abstraction** # LC-3 Data Path(Microarchitecture) #### **Levels of Abstraction** #### LC-3 ISA Overview | 控制指令(Control | Instructions) | |--------------|---------------| |--------------|---------------| 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 BR 0 0 **0 0** n z p PCoffset9 JSR 0 1 0 0 1 PCoffset11 **JSRR** 0 0 0 0 0 BaseR 0 0 0 0 0 0 RTI 1000000000000000000 0 0 0 0 0 BaseR 0 0 0 0 0 0 0 .IMP RFT 0 0 0 0 lololololo TRAP 1 1 1 1 0 0 0 0 TrapVector8 # 数据移动指令 (Data Movement Instructions) #### 取数指令(Load) | | 15 | 14 | 13 | 12 | 11 10 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | |-----|----|----|----|----|---------|----------------|---|-----------|---|------|-----|---|---|---| | LD | 0 | 0 | 1 | 0 | DR | | | F | C | offs | set | 9 | | | | LDR | 0 | 1 | 1 | 0 | DR | BaseR PCoffset | | et6 | | | | | | | | LDI | 1 | 0 | 1 | 0 | DR | PCoffset9 | | | | | | | | | | LEA | 1 | 1 | 1 | 0 | DR | PCoffset9 | | PCoffset9 | | | | | | | #### 存数指令(Store) | | 15 14 13 12 1 | 11 10 9 | 8 7 6 5 4 3 2 1 0 | | | | | |-----|---------------|---------|-------------------|--|--|--|--| | ST | 0 0 1 1 | SR | PCoffset9 | | | | | | STR | 0 1 1 1 | SR | BaseR PCoffset6 | | | | | | STI | 1 0 1 1 | SR | PCoffset9 | | | | | 2025/2/24 #### Lectures - Will not simply "cover" the material - Will focus on the "hard stuff" - Will not stand alone, instead build on reading #### ■ Guest Lectures - Turing Machine - RISC-V - Cambricon #### Discussion sessions - Okay: discuss meaning of problem, discuss approaches - Not okay: comparing answers, solving questions together - Three hours of lecture and an one-hour discussion section per week. - **■** Problems Sets - Labs LC-3 Simulator /Assembler - All labs start after two weeks - Design Project Honor class required - Simulator /Assembler Design - **Exams:** - 1 Midterm + 1 Final - **■** Location - Teaching Building Room 3B202 - Time - $\bullet$ 5(3,4,5): 9:45~12:10 - 60 hours + 40hours, 3+1credit # **Homework Assignments** #### ■ Problem Sets: 6 sets - Problem solving - Complete before each due date - Can work ahead - Great exam preparation! 2025/2/24 # **Labs Assignments** ### ■ Simple Programming Assignments - Programming LC3 Assignments1: Programming In machine language - Programming Assignments2~5:Programming In LC3 assembly language - Programming Assignments6:Assignments2~5 Programming In C #### **■** Challenging Course Projects - LC3 Simulator/Assembler Design (Encouraged! Honor class required, Extra 10 points) - See schedule for each lab due dates #### **Exams & Grades** - Middle Exam 20% - Final Exam: 20% - Assignments : 48% +10% - Problem Sets for every chapter: 6% (BUT it will affect your participation) - 6 Programming Assignments: 48% for each program - LC3 Simulator/Assembler Design (Encouraged! Honor class required, Extra 10 points) - EPA: Effort, Participation and Altruism in discussion section: 12% - If you do not attend discussion, you can not participate Exam; 12% → 0% - If you don't do the problems, you can not participate well; 12% → 0% - Encouraged! Extra 10 points → Secure 100 point #### **EPA vs. GPA** #### **■** Effort Attending prof and TA office hours, completing all assignments, turning in HW, doing reading quizzes #### **■** Participation - Attending lecture - You have 2 slip day tokens (NOT hour or min) - Asking great questions in discussion and lecture and making it more interactive #### **■** Altruism - Helping others in lab or on Piazza: Be Excellent to Each Other - EPA! points have the potential to bump students up to the next grade level! # **Policy on Assignments and Independent Work** # ■ ALL PROJECTS WILL BE DONE AND SUBMITTED INDIVIDUALLY. - With the exception of laboratories and assignments that explicitly permit you to work in groups, all homework and projects are to be YOUR work and your work ALONE. - You are encouraged to discuss your assignments with other students, and extra credit will be assigned to students who help others, particularly by answering questions on Piazza, but we expect that what you hand in is yours. - It is NOT acceptable to copy (or even "start with") solutions from other students or the Web # Tips on How to Get a Good Grade - The lecture material is **not** the most challenging part of the course. You should be able to understand everything as we go along. - DO NOT fall behind in lecture and tell yourself you "will figure it out later from the notes or books". - Notes will be online after the lecture (usually the night). Do assigned reading before the lecture. - Ask questions in class and stay involved in the class that will help you understand. - Discuss with TAs to check your understanding or to ask questions. - Complete all the homework problems even the difficult ones. The exams will test your depth of knowledge. # Tips on How to Get a Good Grade - You need to understand the material well enough to apply it in new situations. You need to enroll in both the lab and the course. - Take the labs very seriously. They are an integral part of the course. - Choose your partner carefully. Your best friend may not be the best choice! - Most important: Be well organized and neat with homework, labs, project. In lab, add complexity a little bit at a time - always have a working design. # Academic Integrity(学术诚信) # Do not post your work on public repositories like github (private o.k.) --negative points #### ■ The rule is simple - Claiming another's work as your own will ruin your life - See syllabus for details and examples #### ■ Who will know? - We will (inspection, similarity detectors, exams) - Your friends will... your parents will... - You will #### **■** Remember - If you need to cheat now, you've got much bigger problems - Cheating is like going 150 MPH over speed limit while drunk! #### **Outline** - 1 Course and Crew - What's the difference between "Big" and "Small"? - 3 Why Take This Course? - 4 Great Ideas in Computing Systems - 5 What's This Course All About? - 6 Summary #### 中国科大的计算机教育目标 #### 面向世界,面向未来,培养一流人才 # 课程目标: 理解计算机系统结构的重要思想 计算思维能力 (算法) 程序设计能力 (编程) 系统设计能力 (结构) 52 € #### **Abstraction Layers in Modern Systems** #### **Application** Algorithm and Data Structure Programming Language/Compiler **Operating System/Virtual Machines** Instruction Set Architecture (ISA) Microarchitecture Gates/Register-Transfer Level (RTL) Analog/Digital Circuits **Electronic Devices** **Physics** # **Acknowledgements** - Special thanks to Prof. An - This course is partly inspired by previous MIT 6.823 and Berkeley CS252 computer architecture courses created by my collaborators and colleagues: - Arvind (MIT) - Joel Emer (Intel/MIT) - James Hoe (CMU) - John Kubiatowicz (UCB) - David Patterson (UCB) - Krste Asanovic (UCB) # 计算机系统概论 Introduction to Computing Systems (CS1002A.02) # Chapter 2-1 Bits, Bytes and Data Types # 计算机科学与技术学院 School of Computer Science and Technology #### **Outline** - 1 How do we represent data in a computer? - 2 Integer Data Types - **2' Complement Integers** - 4 Binary-Decimal Conversion #### **Outline** - 1 How do we represent data in a computer? - **2** Integer Data Types - **2' Complement Integers** - **4** Binary-Decimal Conversion How do we represent data in a computer? ## How do we represent data in a computer? # **Great Idea from Ancient Chinese Philosophy** All things come into being, all things come into nothing 天下万物生于有,有生于无 《老子•四十章》 《易经》 太极生两仪, 两仪生四象, 四象生八卦, 八卦演万物。 ## How do we represent data in a computer? - At the lowest level, a computer is an electronic machine. - works by controlling the flow of electrons - **■** Easy to recognize two conditions: - ullet presence of a voltage we'll call this state "1" - absence of a voltage we'll call this state "0" - Could base state on *value* of voltage, but control and detection circuits more complex. - compare turning on a light switch to measuring or regulating voltage - We'll see examples of these circuits in the next chapter. # Simple Switch Circuit #### Switch open: - No current through circuit - •Light is off - V<sub>out</sub> is +2.9V #### **Switch closed:** - Short circuit across switch - Current flows - Light is on - $ullet V_{out}$ is 0V **Switch-based circuits** can easily represent two states: on/off, open/closed, voltage/no voltage. 2025/2/24 # Computer is a binary digital system. 0.5 #### - Basic unit of information is the *binary digit*, or *bit*. Values with more than two states require multiple bits. - •A collection of two bits has four possible states: 00, 01, 10, 11 - A collection of three bits has eight possible states: 000, 001, 010, 011, 100, 101, 110, 111 - ullet A collection of n bits has $2^n$ possible states. Analog Values → 0 24 2 9 Volts # **Data input: Analog** → **Digital** - Real world is analog! - To import analog information, we must do two things - Sample - E.g., for a CD, every 44,100 ths of a second, we ask a music signal how loud it is. - Quantize - For every one of these samples, we figure out where, on a 16-bit (65,536 tic-mark) "y #### N-type MOS Transistor #### **■**MOS = Metal Oxide Semiconductor two types: N-type and P-type #### ■N-type - when Gate has positive voltage, short circuit between #1 and #2 (switch closed) - when Gate has zero voltage, open circuit between #1 and #2 (switch open) Terminal #2 must be connected to GND (0V). 2025/2/24 #### P-type MOS Transistor #### ■P-type is *complementary* to N-type when Gate has positive voltage, open circuit between #1 and #2 (switch open) • when Gate has zero voltage, short circuit between #1 and #2 (switch closed) Terminal #1 must be connected to +2.9V. 2025/2/24 ## **Logic Gates** - Use switch behavior of MOS transistors to implement logical functions: AND, OR, NOT. - Digital symbols: - recall that we assign a range of analog voltages to each digital (logic) symbol - assignment of voltage ranges depends on electrical properties of transistors being used - typical values for "1": +5V, +3.3V, +2.9V, +1.1V - for purposes of illustration, we'll use +2.9V # Within the Computer: Everything is a Number. #### Numbers within the Computer - Base 10 #s: Dec(imal) - Digits: - 0,1,2,3,4,5,6,7,8,9 • Base 2 #s: - Bin(ary) - Digits: 0,1 - Base 8 #s: Oct(al) Digits: 0,1,2,3,4,5,6,7 - Base 16 #s: Hex(adecimal) Digits:0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F | Dec(imal) | Hex(adecimal) | Oct(al) | Bin(ary) | |-----------|---------------|---------|----------| | 00 | 0 | 00 | 0000 | | 01 | 1 | 01 | 0001 | | 02 | 2 | 02 | 0010 | | 03 | 3 | 03 | 0011 | | 04 | 4 | 04 | 0100 | | 05 | 5 | 05 | 0101 | | 06 | 6 | 06 | 0110 | | 07 | 7 | 07 | 0111 | | 08 | 8 | 10 | 1000 | | 09 | 9 | 11 | 1001 | | 10 | Α | 12 | 1010 | | 11 | В | 13 | 1011 | | 12 | С | 14 | 1100 | | 13 | D | 15 | 1101 | | 14 | E | 16 | 1110 | | 15 | F | 17 | 1111 | #### **Hexadecimal Notation** # ■It is often convenient to write binary (base-2) numbers as hexadecimal (base-16) numbers instead. - •fewer digits -- four bits per hex digit - •less error prone -- easy to corrupt long string of 1's and 0's | Binary | Hex | Decimal | Binary | Hex | Decimal | |--------|-----|---------|--------|-----|---------| | 0000 | 0 | 0 | 1000 | 8 | 8 | | 0001 | 1 | 1 | 1001 | 9 | 9 | | 0010 | 2 | 2 | 1010 | Α | 10 | | 0011 | 3 | 3 | 1011 | В | 11 | | 0100 | 4 | 4 | 1100 | С | 12 | | 0101 | 5 | 5 | 1101 | D | 13 | | 0110 | 6 | 6 | 1110 | E | 14 | | 0111 | 7 | 7 | 1111 | F | 15 | #### 011101010001111010011010111 # **Converting from Binary to Hexadecimal** - **■**Every four bits is a hex digit. - start grouping from right-hand side # **Converting from Binary to Hexadecimal** - **■**Every four bits is a hex digit. - start grouping from right-hand side 3 A 8 F 4 D 7 This is not a new machine representation, just a convenient way to write the number. # Within the Computer: Everything is a Number. #### ■ Which base do we use? ``` #include <stdio.h> int main() { const int N = 1234; printf("Decimal: %d\n", N); printf("Hex: %x\n", N); printf("Octal: %o\n", N); printf("Literals (not supported by all compilers):\n"); printf("0x4d2 = %d (hex)\n", 0x4d2); printf("0b10011010010 = %d (binary) \n", 0b10011010010); printf("02322 = %d (octal, prefix 0 - zero)\n", 02322); Output Decimal: 1234 Hex: 4d2 Literals (not supported by all compilers): 0x4d2 = 1234 \text{ (hex)} 0b10011010010 = 1234 (binary) 02322 = 1234 (octal, prefix 0 - zero) ``` # Within the Computer: Everything is a Number. #### ■ Bit(Blnary digiT) - 1Bits=2things; - 2Bits=4things; - 4Bits=16things; - 8Bits=256things - .... #### ■ Byte - 1Byte=8Bits - A byte is 8 bits #### But numbers usually stored with a fixed size - 8-bit bytes; - 16-bit half words; - 32-bit words; - 64-bit double words, ... - And there are really only two primitive "numbers": 0 and 1 is a "bit" # **BIG IDEA: Bits can represent anything!!!** #### **■** Characters? - 26 letters $\Rightarrow$ 5 bits (2<sup>5</sup> = 32) - upper/lower case + punctuation ⇒ 7 bits (in 8)("ASCII") - standard code to cover all the world's languages ⇒ 8,16,32 bits ("Unicode") www.unicode.com #### ■ Logical values? - ullet 0 $\rightarrow$ False, 1 $\rightarrow$ True - colors? - Ex: Red(00), Green(01), Blue(11) - locations / addresses? - **■** commands? **MEMORIZE:** N bits $\Leftrightarrow$ at most $2^N$ things ## What kinds of data do we need to represent? #### ■ Kinds of data ``` Numbers - signed, unsigned, integers, floating point, complex, rational, irrational, ... Text - characters, strings, ... Logical - true, false Images - pixels, colors, shapes, ... Sound Video - a series of images Instructions ... ``` # Data type: representation and operations within the computer #### We'll start with numbers... #### **Outline** - 1 How do we represent data in a computer? - 2 Integer Data Types - 2' Complement Integers - 4 Binary-Decimal Conversion #### 结绳记数 ■ 名称: (Quipu) 基普结绳 ■ 产地: 南美印加部落 ■ 时间: 16世纪 #### Non-positional notation ● Could represent a number ("5") with a string of ones ("11111") problems? # **Unsigned Integers** #### **■** Weighted positional notation - like decimal numbers: "329" - "3" is worth 300, because of its position, while "9" is only worth 9 # **Unsigned Integers (cont.)** ■An *n*-bit unsigned integer represents $2^n$ values: from 0 to $2^{n-1}$ . | <b>2</b> <sup>2</sup> | <b>2</b> <sup>1</sup> | <b>2</b> <sup>0</sup> | | |-----------------------|-----------------------|-----------------------|---| | 0 | 0 | 0 | 0 | | 0 | 0 | 1 | 1 | | 0 | 1 | 0 | 2 | | 0 | 1 | 1 | 3 | | 1 | 0 | 0 | 4 | | 1 | 0 | 1 | 5 | | 1 | 1 | 0 | 6 | | 1 | 1 | 1 | 7 | # **Unsigned Binary Arithmetic** #### ■Base-2 addition – just like base-10! •add from right to left, propagating carry Subtraction, multiplication, division,... ## **Unsigned Integers (cont.)** ``` 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000_{two} = 0_{ten} 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0001_{two} = 1_{ten} 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0010_{two} = 2_{ten} 0111 1111 1111 1111 1111 1111 1111 1101<sub>two</sub> = 2,147,483,645_{ten} 1000 0000 0000 0000 0000 0000 0000 0000_{two} = 2,147,483,648_{ten} 1000 0000 0000 0000 0000 0000 0001<sub>two</sub> = 2,147,483,649_{ten} 1000 0000 0000 0000 0000 0000 0010<sub>two</sub> = 2,147,483,650_{ten} ``` # Signed Integers #### ■ With n bits, we have 2<sup>n</sup> distinct values. ``` ullet assign about half to positive integers (1 through 2^{n-1}-1) and about half to negative (- 2^{n-1}+1 through -1) ``` • that leaves two values: one for 0, and one extra #### Positive integers just like unsigned - zero in Most Significant (MS) bit ``` 00101 = 5 ``` #### ■ Negative integers - signed-magnitude set MS bit to show negative, other bits are the same as unsigned 10101 = -5 - 1's complement flip every bit to represent negative 11010 = -5 - in either case, MS bit indicates sign: 0=positive, # Three representations of signed integers | | Value Represented | | | | | | | | | | | , | Value Represent | ted | | |----|-------------------|---|---|---|----------------|------------|----|---|----|--------|-----|---|-----------------|-------------|--| | Re | | | | | Representation | | | | on | Signed | 1's | | | | | | | | | | | Magnitude | Complement | | | | | | | Magnitude | Complement | | | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | 1 | 0 | 0 | 0 | 0 | -0 | —15 | | | 0 | 0 | 0 | 0 | 1 | 1 | 1 | | 1 | 0 | 0 | 0 | 1 | —1 | —14 | | | 0 | 0 | 0 | 1 | 0 | 2 | 2 | | 1 | 0 | 0 | 1 | 0 | -2 | <b>—</b> 13 | | | 0 | 0 | 0 | 1 | 1 | 3 | 3 | | 1 | 0 | 0 | 1 | 1 | <b>—</b> 3 | —12 | | | 0 | 0 | 1 | 0 | 0 | 4 | 4 | | 1 | 0 | 1 | 0 | 0 | <b>—</b> 4 | —11 | | | 0 | 0 | 1 | 0 | 1 | 5 | 5 | | 1 | 0 | 1 | 0 | 1 | <b>—</b> 5 | —10 | | | 0 | 0 | 1 | 1 | 0 | 6 | 6 | | 1 | 0 | 1 | 1 | 0 | <b>—</b> 6 | <b>—</b> 9 | | | 0 | 0 | 1 | 1 | 1 | 7 | 7 | | 1 | 0 | 1 | 1 | 1 | <b>—</b> 7 | <b>—</b> 8 | | | 0 | 1 | 0 | 0 | 0 | 8 | 8 | | 1 | 1 | 0 | 0 | 0 | <b>—</b> 8 | <b>—</b> 7 | | | 0 | 1 | 0 | 0 | 1 | 9 | 9 | | 1 | 1 | 0 | 0 | 1 | <b>—9</b> | <b>—</b> 6 | | | 0 | 1 | 0 | 1 | 0 | 10 | 10 | | 1 | 1 | 0 | 1 | 0 | —10 | <b>—</b> 5 | | | 0 | 1 | 0 | 1 | 1 | 11 | 11 | | 1 | 1 | 0 | 1 | 1 | —11 | <b>—4</b> | | | 0 | 1 | 1 | 0 | 0 | 12 | 12 | | 1 | 1 | 1 | 0 | 0 | —12 | <b>—</b> 3 | | | 0 | 1 | 1 | 0 | 1 | 13 | 13 | | 1 | 1 | 1 | 0 | 1 | —13 | <b>—2</b> | | | 0 | 1 | 1 | 1 | 0 | 14 | 14 | | 1 | 1 | 1 | 1 | 0 | —14 | <b>—</b> 1 | | | 0 | 1 | 1 | 1 | 1 | 15 | 15 | 15 | 1 | 1 | 1 | 1 | 1 | —15 | -0 | | #### Signed Magnitude: 5 - 5 = -10 1's Complement: $$5 - 5 = -0$$ + 10101 #### **Outline** - 1 How do we represent data in a computer? - **2** Integer Data Types - **2' Complement Integers** - 4 Binary-Decimal Conversion # 2' s Complement Representation - If number is positive or zero, - •normal binary representation, zeroes in upper bit(s) - If number is negative, - start with positive number - flip every bit (i.e., take the 1's complement) - then add one - **■**This representation makes the hardware simple! ## 2' s Complement # Problems with signed-magnitude and 1' s complement - two representations of zero (+0 and -0) - arithmetic circuits are complex - How to add two signed-magnitude numbers? - e.g., try 2 + (-3) - How to add two 1's complement numbers? - e.g., try 4 + (-3) # ■ 2's complement representation developed to make circuits easy for arithmetic. • for each positive number (X), assign value to its negative (-X), such that X + (-X) = 0 with "normal" addition, ignoring carry out # 2's Complement Shortcut #### ■ To take the 2's complement of a number: - copy bits from right to left until (and including) the first "1" - flip remaining bits to the left # 2's Complement Signed Integers - MS bit is sign bit it has weight $-2^{n-1}$ . - Range of an n-bit number: -2<sup>n-1</sup> through 2<sup>n-1</sup> 1. - ullet The most negative number (-2<sup>n-1</sup>) has no positive counterpart. | <b>-2</b> <sup>3</sup> | <b>2</b> <sup>2</sup> | <b>2</b> <sup>1</sup> | <b>2</b> <sup>0</sup> | | <b>-2</b> <sup>3</sup> | <b>2</b> <sup>2</sup> | <b>2</b> <sup>1</sup> | <b>2</b> <sup>0</sup> | | |------------------------|-----------------------|-----------------------|-----------------------|---|------------------------|-----------------------|-----------------------|-----------------------|----| | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | -8 | | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | -7 | | 0 | 0 | 1 | 0 | 2 | 1 | 0 | 1 | 0 | -6 | | 0 | 0 | 1 | 1 | 3 | 1 | 0 | 1 | 1 | -5 | | 0 | 1 | 0 | 0 | 4 | 1 | 1 | 0 | 0 | -4 | | 0 | 1 | 0 | 1 | 5 | 1 | 1 | 0 | 1 | -3 | | 0 | 1 | 1 | 0 | 6 | 1 | 1 | 1 | 0 | -2 | | 0 | 1 | 1 | 1 | 7 | 1 | 1 | 1 | 1 | -1 | # Three representations of signed integers | | | | | | , | /alua Danvasan | tool | | | | | | | /alua Danyasan | to d | |---|----------------|---|---|---|---------------------|-------------------------------------|-------------------|----------------|-----------|---|------------|-----|---------------------|-------------------------------------|-------------------| | R | Representation | | | | Signed<br>Magnitude | /alue Represen<br>1's<br>Complement | 2's<br>Complement | Representation | | | | on | Signed<br>Magnitude | Value Represen<br>1's<br>Complement | 2's<br>Complement | | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | -0 | <u>-</u> 15 | <del>-</del> 16 | | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | <b>—</b> 1 | —14 | <b>—</b> 15 | | 0 | 0 | 0 | 1 | 0 | 2 | 2 | 2 | 1 | 0 | 0 | 1 | 0 | <b>—</b> 2 | —13 | —14 | | 0 | 0 | 0 | 1 | 1 | 3 | 3 | 3 | 1 | 0 | 0 | 1 | 1 | <b>—</b> 3 | —12 | <b>—</b> 13 | | 0 | 0 | 1 | 0 | 0 | 4 | 4 | 4 | 1 | 0 | 1 | 0 | 0 | <b>—</b> 4 | —11 | —12 | | 0 | 0 | 1 | 0 | 1 | 5 | 5 | 5 | 1 | 1 0 1 0 1 | | <b>—</b> 5 | —10 | —11 | | | | 0 | 0 | 1 | 1 | 0 | 6 | 6 | 6 | 1 | 1 0 1 1 0 | | | 0 | <b>—</b> 6 | <b>—</b> 9 | —10 | | 0 | 0 | 1 | 1 | 1 | 7 | 7 | 7 | 1 | 0 | 1 | 1 | 1 | <b>—</b> 7 | <b>—</b> 8 | <b>—</b> 9 | | 0 | 1 | 0 | 0 | 0 | 8 | 8 | 8 | 1 | 1 | 0 | 0 | 0 | -8 | <b>—</b> 7 | <b>—</b> 8 | | 0 | 1 | 0 | 0 | 1 | 9 | 9 | 9 | 1 | 1 | 0 | 0 | 1 | <b>—</b> 9 | <b>—</b> 6 | <b>—</b> 7 | | 0 | 1 | 0 | 1 | 0 | 10 | 10 | 10 | 1 | 1 | 0 | 1 | 0 | —10 | <b>—</b> 5 | <b>—</b> 6 | | 0 | 1 | 0 | 1 | 1 | 11 | 11 | 11 | 1 | 1 | 0 | 1 | 1 | —11 | <b>—</b> 4 | <b>—</b> 5 | | 0 | 1 | 1 | 0 | 0 | 12 | 12 | 12 | 1 | 1 | 1 | 0 | 0 | —12 | <b>—</b> 3 | <b>—4</b> | | 0 | 1 | 1 | 0 | 1 | 13 | 13 | 13 | 1 | 1 | 1 | 0 | 1 | —13 | <b>—</b> 2 | <b>—</b> 3 | | 0 | 1 | 1 | 1 | 0 | 14 | 14 | 14 | 1 | 1 | 1 | 1 | 0 | —14 | —1 | -2 | | 0 | 1 | 1 | 1 | 1 | 15 | 15 | 15 | 1 | 1 | 1 | 1 | 1 | <b>—</b> 15 | -0 | <b>—</b> 1 | #### Signed Magnitude: $$5 - 5 = -10$$ 1's Complement: $$5 - \dot{5} = -0$$ 2's Complement: 5 - 5 = 0 (5) # 2's Complement ``` Sian Bit 000000000000000000000000000000001_{two} = 1_{ten} 1111 1111 1111 1111 1111 1111 1111 t_{two} = -1_{ten} ``` #### Q&A ■ Suppose we had a 5-bit word. What integers can be represented in 2's complement? ``` A. -32~+31 B. 0~+31 C. -16~+15 ``` D. -15~+16 - Suppose we had a 8-bit word. What integers can be represented in 2's complement? - Suppose we had a 16-bit word. What integers can be represented in 2's complement? - Suppose we had a 32-bit word. What integers can be represented in 2's complement? #### Q&A ■ Suppose we had a 5-bit word. What integers can be represented in 2's complement? ``` A. -32~+31 ``` - Suppose we had a 8-bit word. What integers can be represented in 2's complement? - Suppose we had a 16-bit word. What integers can be represented in 2's complement? - Suppose we had a 32-bit word. What integers can be represented in 2's complement? #### **Outline** - 1 How do we represent data in a computer? - **2** Integer Data Types - 2' Complement Integers - 4 Binary-Decimal Conversion ### Converting Binary (2' s complement) to Decimal - 1. If leading bit is one, take 2's complement to get a positive number. - 2. Add powers of 2 that have "1" in the corresponding bit positions. - 3. If the original number was negative, add a minus sign. $$X = 01101000_{two}$$ = $2^6+2^5+2^3 = 64+32+8$ = $104_{ten}$ | n | 2 <sup>n</sup> | |----|----------------| | 0 | 1 | | 1 | 2 | | 2 | 4 | | 3 | 8 | | 4 | 16 | | 5 | 32 | | 6 | 64 | | 7 | 128 | | 8 | 256 | | 9 | 512 | | 10 | 1024 | | | | Assuming 8-bit 2's complement numbers. # **More Examples** $$X = 00100111_{two}$$ = $2^5+2^2+2^1+2^0 = 32+4+2+1$ = $39_{ten}$ $$X = 11100110_{two}$$ $-X = 00011010$ $= 2^4 + 2^3 + 2^1 = 16 + 8 + 2$ $= 26_{ten}$ $X = -26_{ten}$ | n | 2 <sup>n</sup> | |----|----------------| | 0 | 1 | | 1 | 2 | | 2 | 4 | | 3 | 8 | | 4 | 16 | | 5 | 32 | | 6 | 64 | | 7 | 128 | | 8 | 256 | | 9 | 512 | | 10 | 1024 | Assuming 8-bit 2's complement numbers. # **Converting Decimal to Binary (2' s C)** #### First Method: Division - 1. Divide by 2 the remainder is the least significant bit. - 2. Keep dividing by 2 until answer is zero, writing remainders from right to left. - 3. Append a zero as the MS bit; if the original number is negative, take 2's complement. # Converting Decimal to Binary (2' s C) #### **Second Method:** Subtract Powers of Two - 1. Change to positive decimal number. - 2. Subtract the largest power of two less than or equal to number. - $n \mid 2^n$ 0 - 3. Put an 1 in the corresponding bit position. 4. Keep subtracting until result is 0. 16 5. Append a 0 as MS bit; - 32 - if original was negative, take 2's complement - 64 128 $X = 104_{ten}$ bit 6 -64 = 4040 - 32 = 8bit 5 256 512 $X = 01101000_{two}$ - 8 8 = 0bit 3 - 1024 # **Signed and Unsigned Integers** - C, C++, and Java have signed integers, e.g., 7, -255 • int x, y, z; - C, C++ also have unsigned integers, which are used for addresses - 32-bit word can represent 2<sup>32</sup> binary numbers - Unsigned integers in 32 bit word represent 0 to 2<sup>32</sup>-1 (4,294,967,295) # 计算机系统概论 Introduction to Computing Systems (CS1002A.02) # Chapter 2-2 Operations and Other Data Types #### 计算机科学与技术学院 School of Computer Science and Technology #### **Outline** - 1 Operations on Bits: Arithmetic and Logical - 2 Other Representation - 3 Summary #### **Outline** - 1 Operations on Bits: Arithmetic and Logical - **2** Other Representation - 3 Summary # **Operations: Arithmetic and Logical** - Recall: a data type includes representation and operations. We now have a good representation for signed integers, so let's look at some arithmetic operations: - Addition - Subtraction - Sign Extension - We'll also look at overflow conditions for addition. - Multiplication, division, etc., can be built from these basic operations. - **■** Logical operations are also useful: - AND - OR - NOT # 2' s Complement Signed Integers - MS bit is sign bit it has weight $-2^{n-1}$ . - Range of an n-bit number: -2<sup>n-1</sup> through 2<sup>n-1</sup> 1. - ullet The most negative number (-2<sup>n-1</sup>) has no positive counterpart. | <b>-2</b> <sup>3</sup> | <b>2</b> <sup>2</sup> | <b>2</b> <sup>1</sup> | 20 | | <b>-2</b> <sup>3</sup> | <b>2</b> <sup>2</sup> | <b>2</b> <sup>1</sup> | <b>2</b> <sup>0</sup> | | | |------------------------|-----------------------|-----------------------|----|---|------------------------|-----------------------|-----------------------|-----------------------|----|--| | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | -8 | | | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | -7 | | | 0 | 0 | 1 | 0 | 2 | 1 | 0 | 1 | 0 | -6 | | | 0 | 0 | 1 | 1 | 3 | 1 | 0 | 1 | 1 | -5 | | | 0 | 1 | 0 | 0 | 4 | 1 | 1 | 0 | 0 | -4 | | | 0 | 1 | 0 | 1 | 5 | 1 | 1 | 0 | 1 | -3 | | | 0 | 1 | 1 | 0 | 6 | 1 | 1 | 1 | 0 | -2 | | | 0 | 1 | 1 | 1 | 7 | 1 | 1 | 1 | 1 | -1 | | #### **Addition** # As we' ve discussed, 2' s complement addition is just binary addition. - •assume all integers have the same number of bits - ignore carry out - for now, assume that sum fits in n-bit 2's complement representation Assuming 8-bit 2's complement numbers. #### **Subtraction** #### Negate subtrahend (2nd number) and add. - assume all integers have the same number of bits - ignore carry out - for now, assume that the difference fits in n-bit 2's complement representation Assuming 8-bit 2's complement numbers. # **Sign Extension** To add two numbers, we must represent them with the same number of bits. If we just pad with zeroes on the left: | <u>4-bit</u> | <u>8-bit</u> | | |------------------|--------------|--------------| | 0100 (4) | 00000100 | (still 4) | | <b>1100</b> (-4) | 00001100 | (12, not -4) | Instead, replicate the most significant bit -- the sign bit: | <u>4-bit</u> | <u>8-bit</u> | | |------------------|--------------|------------| | 0100 (4) | 00000100 | (still 4) | | <b>1100</b> (-4) | 11111100 | (still -4) | #### **Overflow** - Recall the represent range of n-bit 2' complement Signed Integers - For an n-bit number: $$-2^{n-1} \sim 2^{n-1} - 1$$ ■ Can we use n-bit 2' complement to represent a value larger than 2<sup>n-1</sup>-1? Or a value smaller than -2<sup>n-1</sup>? #### **Overflow** ■If operands are too big, then sum cannot be represented as an *n*-bit 2's complement number. #### **■**We have overflow if: - signs of both operands are the same, and - the sign of sum is different. #### **Overflow** #### ■Another test -- easy for hardware: The carry into most significant bit is not equal to the carry out 01000 (8) 11000 (-8) + 01001 (9) + 10111 (-9) 10001 (-15) 01111 (+15) $$\checkmark \lor \lor$$ 01 10 # **Logical Operations** #### **■**Operations on logical TRUE or FALSE - two states -- takes one bit to represent: - TRUE=1, FALSE=0 - ■View n-bit number as a collection of n logical values - operation applied to each bit independently - Bitwise operation | A | В | A AND B | A | В | A OR B | Α | <b>NOT</b> A | |---|---|---------|---|---|--------|---|--------------| | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | | 1 | 0 | 0 | 1 | 0 | 1 | | | | 1 | 1 | 1 | 1 | 1 | 1 | | | # **Examples of Logical Operations** #### **AND** - useful for clearing bits, bitmask - -AND with zero = 0 - -AND with one = no change - 11000101 - AND 00001111 - 00000101 #### **Inclusive OR** - useful for setting bits - -OR with zero = no change - -OR with one = 1 - 11000101 - OR 00001111 - 11001111 #### NOT - unary operation -- one argument - flips every bit - NOT 11000101 - 00111010 ## **Examples of Logical Operations** #### **Exclusive-OR (XOR)** The output of XOR is 1 if one (but not both) of the two sources is 1. The output of XOR is 0 if both sources are 1 or if neither source is 1. | В | XOR | |---|-----| | 0 | 0 | | 1 | 1 | | 0 | 1 | | 1 | 0 | | | 0 | # **DeMorgan's Laws** ■ There are two well-known relationships between AND functions and OR functions, known as DeMorgan's Laws. ### **Outline** - 1 Operations on Bits: Arithmetic and Logical - 2 Other Representation - 3 Summary #### **Fractions: Fixed-Point** #### **■**How can we represent fractions? - Use a "binary point" to separate positive from negative powers of two -- just like "decimal point." - 2's complement addition and subtraction still work. - -if binary points are aligned $$2^{-1} = 0.5$$ $$2^{-2} = 0.25$$ $$2^{-3} = 0.125$$ $$00101000.101 (40.625)$$ + 11111110.110 (-1.25) $$00100111.011 (39.375)$$ No new operations -- same as integer arithmetic. $2^n$ 16 32 64 128 256 512 1024 # **Very Large and Very Small Data** The LC-3 use the 16-bit 2's complement data type, One bit to identify positive or negative, 15bits to represent the magnitude of the value. We can express values: > - 2<sup>15</sup> through 2<sup>15</sup> -1 (- 32768 through 32767) How can we represent very large and very small data? # **Very Large and Very Small Data** Large values: 6.023 x 10<sup>23</sup> — requires 79 bits Small values: 6.626 x 10<sup>-34</sup> — requires >110 bits How can we represent very large and very small data? # **Very Large and Very Small: Floating-Point** Large values: 6.023 x 10<sup>23</sup> — requires 79 bits Small values: 6.626 x 10<sup>-34</sup> — requires >110 bits Use equivalent of "scientific notation": F x 2<sup>E</sup> Need to represent F (*fraction*), E (*exponent*), and sign. ## **IEEE 754 Floating-Point Standard (32-bits):** | <u>1b</u> | <u> 8b</u> | <23b → | |-----------|------------|----------| | S | Exponent | Fraction | ### **Normalized Form** $$N=(-1)^S \times 1. {\rm fraction} \times 2^{{\rm exponent}-127}, \quad 1 \leq {\rm exponent} \leq 254$$ exponent : $0{\rm b}0000\_0000 < {\rm exponent} < 0{\rm b}1111\_1111$ ## - Sign is 1 number is negative. - Exponent field, unsigned integer, excess code, biased representations: 01111110 = 126 (decimal). - Fraction is .10000000000... = .5 (decimal). Value = $$-1.5 \times 2^{(126-127)} = -1.5 \times 2^{-1} = -0.75$$ . # **Floating Point Example** - **Example 2.13** - $6\frac{5}{8}$ represented in the floating point data type - Example 2.14 - **■** Example 2.15 # **Very Small: Floating-Point** $$N = (-1)^S \times 1.$$ fraction $\times 2^{\text{exponent}-127}$ , $1 \le \text{exponent} \le 254$ exponent : $0\text{b}0000\_0000 < \text{exponent} < 0\text{b}1111\_1111$ ■ The smallest number that can be represented in normalized form is # **Very Small: subnormal numbers** $$N = (-1)^{S} \times 0.$$ fraction $\times 2^{-126}$ , exponent = 0 ■ The largest subnormal number is ■ The smallest subnormal number is **■** Example ### **Infinities** #### Normalized From: $N = (-1)^{s} \times 1.$ fraction $\times 2^{\text{exponent}-127}$ , $1 \leq \text{exponent} \leq 254$ Subnormal numbers: $N = (-1)^S \times 0.$ fraction $\times 2^{-126}$ , exponent = 0 ### ■ So, what if the exponent is equal to 1111\_1111? - If the exponent field contains 1111\_1111, we use the floating point data type to represent various things, among them the notion of infinity. - Infinity is represented by the exponent field containing all 1s and the fraction field containing all 0s. - We represent positive infinity if the sign bit is 0 and negative infinity if the sign bit is 1 ## **Floating-Point Operations** ### Question - Will the regular 2's complement arithmetic work for Floating Point numbers? - (Hint: In decimal, how do we compute 3.07 x 1012 + 9.11 x 108?) #### **Text: ASCII Characters** #### ASCII: Maps 128 characters to 7-bit code. both printable and non-printable (ESC, DEL, ...) characters ``` 00 nul 10 dle 20 sp 30 0 40 @ 50 P 70 01 soh 11 dc1 21 ! 31 1 41 A 51 Q 61 a 71 02 stx 12 dc2 22 " 32 2 42 B 52 R 62 b 72 r 03 etx 13 dc3 23 # 33 3 43 C |53 S | 63 C | 73 S 04 eot 14 dc4 24 $ 34 4 44 D 54 T 64 d 74 t 05 eng 15 nak 25 % 35 5 45 E 55 U 65 e 75 u 06 ack 16 syn 26 & 36 6 46 F 56 V 66 £ 76 v 07 bel 17 etb 27 ' 37 7 47 G 57 W 67 g 77 w 08 bs 18 can 28 ( 38 8 48 H 58 X 68 78 x 09 ht 19 em 29 ) 39 9 49 I |59 Y |69 i | 79 y 0a nl 1a sub 2a * 3a : 4a J 5a Z 6a i 7a z 0b vt 1b esc 2b + 3b ; 4b K 6b k 7b 5b [ 0c np | 1c fs | 2c , 3c < 4c L 5c \ 6c 1 7c 0d cr | 1d gs | 2d - 3d = |4d M|5d | 6d m 7d 3e > 4e N 5e ^ 0e so 1e rs 2e . 6e n 7e ~ Of si | 1f us | 2f 3f ? 4f O 5f 6£ ``` # **ASCII** (American Standard Code for Information Interchange) | Dec | Hex | Char | Dec | Hex | Char | Dec | Hex | Char | Dec | Hex | Char | |-----|-----|------|-----|-----|---------|-----|-----|------|-----|-----|------| | 0 | 0 | NUL. | 32 | 20 | (space) | 64 | 40 | 9 | 96 | 60 | | | 1 | 1 | SOH | 33 | 21 | - 1 | 65 | 41 | A | 97 | 61 | 8. | | 2 | 2 | STX | 34 | 22 | " | 66 | 42 | В | 98 | 62 | ъ | | 3 | 3 | ETX | 35 | 23 | # | 67 | 43 | С | 99 | 63 | c | | 4 | 4 | EOT | 36 | 24 | \$ | 68 | 44 | D | 100 | 64 | d | | 5 | 5 | ENQ | 37 | 25 | × | 69 | 45 | E | 101 | 65 | e | | 6 | 6 | ACK | 38 | 26 | ě. | 70 | 46 | F | 102 | 66 | £ | | 7 | 7 | BEL | 39 | 27 | , | 71 | 47 | G | 103 | 67 | ε | | 8 | 8 | BS | 40 | 28 | ( | 72 | 48 | н | 104 | 68 | h | | 9 | 9 | нг | 41 | 29 | ) | 73 | 49 | I | 105 | 69 | i | | 10 | OA | LF | 42 | 2A | * | 74 | 4A | J | 106 | 6A | j | | 11 | OB | VT | 43 | 2B | + | 75 | 4B | K | 107 | 6B | k | | 12 | 00 | FF | 44 | 20 | , | 76 | 4C | L | 108 | 6C | 1 | | 13 | OD | CR | 45 | 2D | - | 77 | 4D | M | 109 | 6D | n | | 14 | OE | S0 | 46 | 2E | | 78 | 4E | N | 110 | 6E | n | | 15 | OF | SI | 47 | 2F | 1 | 79 | 4F | 0 | 111 | 6F | 0 | | 16 | 10 | SLE | 48 | 30 | 0 | 80 | 50 | P | 112 | 70 | p | | 17 | 11 | CS1 | 49 | 31 | 1 | 81 | 51 | Q | 113 | 72 | q | | 18 | 12 | DC2 | 50 | 32 | 2 | 82 | 52 | R | 114 | 72 | r | | 19 | 13 | DC3 | 51 | 33 | 3 | 83 | 53 | S | 115 | 73 | s | | 20 | 14 | DC4 | 52 | 34 | 4 | 84 | 54 | T | 116 | 74 | ŧ | | 21 | 15 | NAK | 53 | 35 | 5 | 85 | 55 | υ | 117 | 75 | u | | 22 | 16 | SYN | 54 | 36 | 6 | 86 | 56 | ν | 118 | 76 | v | | 23 | 17 | ETB | 55 | 37 | 7 | 87 | 57 | Y | 119 | 77 | ¥ | | 24 | 18 | CAN | 56 | 38 | 8 | 88 | 58 | Х | 120 | 78 | × | | 25 | 19 | EM | 57 | 39 | 9 | 89 | 59 | Y | 121 | 79 | у | | 26 | 1A | SIB | 58 | 3A | - 1 | 90 | 5A | Z | 122 | 7A | z | | 27 | 1B | ESC | 59 | 3B | - 1 | 91 | 5B | ι | 123 | 7B | -{ | | 28 | 1C | FS | 60 | 3C | < | 92 | 5C | - 1 | 124 | 7C | | | 29 | 1D | GS | 61 | 3D | = | 93 | 5D | 1 | 125 | 70 | } | | 30 | 1E | RS | 62 | 3E | > | 94 | SE | - | 126 | 7E | ~ | | 31 | 1F | US | 63 | 3F | ? | 95 | 5F | | 127 | 7F | DEL | ## ASCII 码 高四位 低四位 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 15 ^N AO. SO 移出 30 注,表中的ASCII字符可以用 "Alt + 小键盘上的数字键 "方法输入。 #### ASCII表 ( American Standard Code for Information Interchange 美国标准信息交换代码 ) ASCII控制字符 ASCIT打印字符 0000 0001 0010 0011 0101 0110 0111 十进 字符 Ctrl 十进 字符 十进 字符 十进 字符 中进 字符 中进 字符 十进 字符 Ctrl 字符解释 Ctrl 空字符 ^@ NUL 16 DLE 数据链路转义 32 p 标题开始 设备控制 1 ^A SOH ^Q DC1 33 49 ^B STX 正文开始 18 ^R DC2 设备控制 2 34 50 R b 114 3 ^C 正文结束 AS DOS 设备控制 3 35 51 99 ^D 传输结束 20 AT DC4 设备控制 4 36 100 EOT 9/0 5 ^E ENO 音询 21 ALI NAK 否定应答 37 53 85 101 肯定应答 同步空闲 ACK AV 38 & V 102 6 54 7 ^G 咱铃 23 传输块结束 39 55 W 103 119 W ^H 退格 8 BS 24 CAN 取治 40 56 X 104 h 120 ^ ١t 構向制表 25 EH 介盾结束 41 105 121 换行 替代 42 \* 122 \n 26 SHE 58 106 ^K VT 纵向制表 ESC 溢出 K 107 123 43 59 75 换而 文件分隔符 ^L FF 28 108 124 44 CR 回车 29 组分隔符 45 61 77 93 109 m 125 记录分隔符 单元分隔符 46 78 126 \*Backspace 127 △ 代码: DEL 20 云教程中心 110 111 # **Interesting Properties of ASCII Code** - What is the relationship between a decimal digit ('0', '1', ...) and its ASCII code? "30h" - What is the difference between an upper-case letter ('A', 'B', ...) and its lower-case equivalent ('a', 'b', ...)? "20h" - Given two ASCII characters, how do we tell which comes first in alphabetical order? - Are 128 characters enough? (http://www.unicode.org/ No new operations -- integer arithmetic and logic. ## **Other Data Types** #### ■ Text strings - sequence of characters, terminated with NULL (0) - typically, no hardware support #### ■ Image - array of pixels - monochrome: one bit (1/o = black/white) - color: red, green, blue (RGB) components (e.g., 8 bits each) - other properties: transparency - hardware support: - typically none, in general-purpose processors - MMX -- multiple 8-bit operations on 32-bit word #### ■ Sound sequence of fixed-point numbers Within the Computer: Everything is a Number. ## **LC-3 Data Types** - Some data types are supported directly by the instruction set architecture. - For LC-3, there is only one supported data type: - 16-bit 2's complement signed integer - Operations: ADD, AND, NOT - Other data types are supported by <u>interpreting</u> 16-bit values as logical, text, fixed-point, etc., in the software that we write. ## **Outline** - 1 Operations on Bits: Arithmetic and Logical - **2** Other Representation - 3 Summary # **Summary** - Everything in a computer is a number, in fact only 0 and 1. - Integers are interpreted by adhering to fixed length - Negative numbers are represented with 2's complement - Overflows can be detected utilizing the carry bit - We will get into some more representations later when we talk about floating point - Signed Magnitude & Biased representations are needed in floating point for specific uses - Not going to talk about '1' s complement', its a joke that nobody uses # 计算机系统概论 Introduction to Computing Systems (CS1002A.02) #### 计算机科学与技术学院 School of Computer Science and Technology # **Outline** - 1 Review - **2** The Transistor - 3 Logic Gates - 4 Summary # **Outline** - 1 Review - **2** The Transistor - 3 Logic Gates - 4 Summary #### **Review** - Great Idea #0: Great Idea from Ancient Chinese Philosophy(Bits and Bytes) - How do we represent data in a computer? - Bits: 0/1 - Data type: representation and operations within the computer - Integer Data Types - · Unsigned Integers - 2'Complement Integers - Fixed-Point Data Types - Floating-Point Data Types - Text characters, strings, ... - Images pixels, colors, shapes, ... - Sound - Instructions - \_ ..... - Arithmetic and Logical Operations - Binary-Decimal Conversion # **Today** ## **■**Microprocessors contain millions of transistors • Intel Core 2 Duo: 291 million AMD Barcelona: 463 million ● IBM Power6: 790 million ### **■**Transistor: Building Block of Computers Logically, each transistor acts as a switch ## **■**Combined to **implement** logic functions AND, OR, NOT ## **Approach: Bottom Up** # **Great Idea #3: Abstraction Helps Us Manage Complexity** # **Great Idea #4: Software and Hardware Codesign** # **Outline** - 1 Review - **2** The Transistor - 3 Logic Gates - 4 Summary #### Vacuum tubes used as switches ## Switch open: - No current through circuit - Light is off - $\bullet$ $V_{out}$ is +2.9V #### Switch closed: - Short circuit across switch - Current flows - Light is on - •V<sub>out</sub> is 0V **Switch-based circuits** can easily represent two states: on/off, open/closed, voltage/no voltage. # Vacuum tubes used as switches made electronic computing e electronic computing possible for short mean time to failure of tubes dom was that valves—which, like filament—could never be used ey were unreliable, and in a large short a time".Tommy Flowers, who d that, so long as valves were erate reliably for very long periods, on a reduced current". In 1934 l installation using over 3,000 tubes a tube failed, it was possible to others going, thereby reducing the d: this installation was accepted by he exchanges). Flowers was also a (compared to electromechanical rk confirmed that tube unreliability ally believed; the 1946 ENIAC, with which took 15 minutes to locate) on of the tubes was a factor, and the Second World War lowered the war Colossus was instrumental in development continued with tubecomputers ENIAC and Whirlwind, commercially available electronic ble commercially. included the Jaincomp series of nstrument Company of Bethesda, p-B employed just 300 such tubes formance to rival many of the then #### The invention of the transistor #### Bell Labs lays the groundwork - 1945: Bell sets up lab in the hopes of developing "solid state" components to replace existing electromechanical systems. William Shockley, John Bardeen, Walter Brattain: all solid-state physicists. Focus on Si and Ge. - 1947: The Invention of the First Transistorthe point-contact transistor - 1951: Shockley develops junction transistor which can be manufactured in quantity. - 1954: The first transistor radio! Also, TI makes first silicon transistor (price \$2.50) - 1956: Bardeen, Shockley, Brattain receive Nobel Prize. ## A transistor under a microscope - http://www.zhihu.com/question/26998618 - How is it possible to have tens of millions of transistors in a chip? # Microprocessors contain millions of transistors # Microprocessors contain millions of transistors | СРИ | Manufacturing<br>Process | Cores | GPU | Transistor<br>Count<br>(Schematic) | Die Size | |--------------------------|--------------------------|-------|-----|------------------------------------|-----------------------------| | Haswell GT3<br>4C | 22nm | 4 | GT3 | ? | 264mm <sup>2</sup><br>(est) | | Haswell GT2<br>4C | 22nm | 4 | GT2 | 1.4B | 177mm <sup>2</sup> | | AMD Vishera<br>8C | 32nm | 8 | N/A | 1.2B | 315mm <sup>2</sup> | | Intel Sandy<br>Bridge 4C | 32nm | 4 | GT2 | 995M | 216mm <sup>2</sup> | | Intel Lynnfield<br>4C | 45nm | 4 | N/A | 774M | 296mm <sup>2</sup> | 2025/2/24 #### Moore's Law – The number of transistors on integrated circuit chips (1971-2018) Moore's law describes the empirical regularity that the number of transistors on integrated circuits doubles approximately every two years. This advancement is important as other aspects of technological progress – such as processing speed or the price of electronic products – are linked to Moore's law. Data source: Wikipedia (https://en.wikipedia.org/wiki/Transistor\_count) The data visualization is available at OurWorldinData.org. There you find more visualizations and research on this topic. Licensed under CC-BY-SA by the author Max Roser. #### **Transistor** as a switch ## Switch open: - No current through circuit - Light is off - V<sub>out</sub> is +2.9V #### Switch closed: - Short circuit across switch - Current flows - Light is on - $\bullet$ $v_{out}$ is 0v **Switch-based circuits** can easily represent two states: on/off, open/closed, voltage/no voltage. #### **How does MOSFET work?** ■ <u>【探索】MOSFET是如何工作的,科学,科普,好看视频</u> (baidu.com) # **N-type MOS Transistor** - When the gate is supplied with 1.29 volts, the transistor acts like a piece of wire, completing the circuit and causing the bulb to glow. - When the gate is supplied with 0 volts, the transistor acts like an open circuit, breaking the circuit, and causing the bulb to not glow. 2025/2/24 ## N-type MOS Transistor #### **■**MOS = Metal Oxide Semiconductor • two types: N-type and P-type ■N-type metal-oxide-semiconductor (NMOS) ### ■N-type when Gate has positive voltage, short circuit between #1 and #2 (switch closed) when Gate has zero voltage, open circuit between #1 and #2 (switch open) Gate = 1 Terminal #2 must be connected to GND (0V). #### P-type MOS Transistor #### ■P-type metal-oxide-semiconductor (PMOS) #### ■P-type is *complementary* to N-type when Gate has <u>positive</u> voltage, open circuit between #1 and #2 (switch <u>open</u>) when Gate has zero voltage, short circuit between #1 and #2 (switch closed) Terminal #1 must be connected to +2.9V. #### **Outline** - 1 Review - **2** The Transistor - 3 Logic Gates - 4 Summary #### **Approach: Bottom Up** # **Great Idea #3: Abstraction Helps Us Manage Complexity** # **Great Idea #4: Software and Hardware Codesign** #### **Logic Gates** Use switch behavior of MOS transistors to implement logical functions: AND, OR, NOT. #### ■ Digital symbols: • recall that we assign a range of analog voltages to each digital (logic) symbol assignment of voltage ranges depends on electrical properties of transistors being used 2025/2/24 #### **CMOS Circuit** #### **■CMOS: Complementary MOS** #### **■**Uses both N-type and P-type MOS transistors - P-type - Attached to + voltage - Pulls output voltage UP when input is zero - N-type - Attached to GND - Pulls output voltage DOWN when input is one - For all inputs, make sure that output is either connected to GND or to +,but not both! #### **Inverter (NOT Gate)** #### The process of fabrication of a CMOS inverter Simplified process of fabrication of a CMOS inverter on p-type substrate in semiconductor microfabrication. #### **NOR Gate** Note: Serial structure on top, parallel on bottom. $\frac{2025}{2}$ | INPUT | | OUTPUT | |-------|---|--------| | Α | В | С | | 0 | 0 | 1 | | 0 | 1 | 0 | | 1 | 0 | 0 | | 1 | 1 | 0 | #### **OR Gate** | INPUT | | OUTPUT | |-------|---|--------| | Α | В | С | | 0 | 0 | 0 | | 0 | 1 | 1 | | 1 | 0 | 1 | | 1 | 1 | 1 | Add inverter to NOR. #### NAND Gate (AND-NOT) | INPUT | | OUTPUT | |-------|---|--------| | Α | В | С | | 0 | 0 | 1 | | 0 | 1 | 1 | | 1 | 0 | 1 | | 1 | 1 | 0 | Note: Parallel structure on top, serial on bottom. $\frac{2025}{2/2}$ #### **Example: NAND gate in physical layout** #### **NAND** gate in CMOS logic The <u>physical layout</u> of a NAND circuit. The larger regions of Ntype diffusion and P-type diffusion are part of the transistors. The two smaller regions on the left are taps to prevent latchup. #### **AND Gate** | INPUT | | OUTPUT | | |-------|---|--------|--| | Α | В | С | | | 0 | 0 | 0 | | | 0 | 1 | 0 | | | 1 | 0 | 0 | | | 1 | 1 | 1 | | Add inverter to NAND. #### **Practice 1** Implement a 3-input NOR gate with CMOS. #### **Basic Gates** #### **■** From Now on.....Gates - Covered transistors mostly so that you know they exist - Note: "Logic Gate" not related to "Gate" of MOSFET transistors #### ■ Will study implementation in terms of gates Circuits that implement Boolean functions #### ■ More complicated gates from transistors possible • XOR, Multiple-input AND-OR-Invert (AOI) gates #### More than 2 Inputs? - AND/OR can take any number of inputs. - AND = 1 if all inputs are 1. - OR = 1 if any input is 1. - Similar for NAND/NOR. - Can implement with multiple two-input gates, or with single CMOS circuit. #### **Visual Shorthand for Multi-bit Gates** #### ■ Use a cross-hatch mark to group wires - Example: calculate the AND of a pair of 4-bit numbers - A3 is "high-order" or "most-significant" bit - ullet If "A" is 1000, then $A_3=1$ , $A_2=0$ , $A_1=0$ , $A_0=0$ #### Shorthand for Inverting Signals #### ■ Invert a signal by adding either - A circle or an "inversion bubble" before/after a gate - A "bar" over the letter #### **Logical Completeness** #### ■ AND, OR, NOT can implement ANY truth table | | Α | В | С | D | |---|---|---|---|---| | _ | 0 | 0 | 0 | 0 | | | 0 | 0 | 1 | 0 | | | 0 | 1 | 0 | 1 | | | 0 | 1 | 1 | 0 | | | 1 | 0 | 0 | 0 | | | 1 | 0 | 1 | 1 | | | 1 | 1 | 0 | 0 | | | 1 | 1 | 1 | 0 | 1. AND combinations that yield a "1" in the truth table. 2. OR the results of the AND gates. #### **Logical Completeness** #### ■ AND, OR, NOT can implement ANY truth table #### **DeMorgan's Law** ## Converting AND to OR (with some help from NOT) **Consider the following gate:** | Α | В | Ā | B | $\overline{A} \cdot \overline{B}$ | $\overline{\overline{A} \cdot \overline{B}}$ | |---|---|---|---|-----------------------------------|----------------------------------------------| | 0 | 0 | 1 | 1 | 1 | 0 | | 0 | 1 | 1 | 0 | 0 | 1 | | 1 | 0 | 0 | 1 | 0 | 1 | | 1 | 1 | 0 | 0 | 0 | 1 | Same as A OR B! To convert AND to OR (or vice versa), invert inputs and output. Why might this be useful? #### **Practice 2** | Α | В | С | |---|---|---| | 0 | 0 | 0 | | 0 | 1 | 1 | | 1 | 0 | 1 | | 1 | 1 | 0 | #### **Outline** - 1 Review - **2** The Transistor - 3 Logic Gates - 4 Summary #### **N-type MOS Transistor** #### ■MOS = Metal Oxide Semiconductor • two types: N-type and P-type #### **■**N-type when Gate has positive voltage, short circuit between #1 and #2 (switch closed) when Gate has zero voltage, open circuit between #1 and #2 (switch open) Terminal #2 must be connected to GND (0V). #### P-type MOS Transistor #### ■P-type is *complementary* to N-type when Gate has <u>positive</u> voltage, open circuit between #1 and #2 (switch open) when Gate has <u>zero</u> voltage, short circuit between #1 and #2 (switch <u>closed</u>) Terminal #1 must be connected to +2.9V. #### **Gates** ### 计算机系统概论 Introduction to Computing Systems (CS1002A.02) # Chapter 3-2 Combinational Logic Circuits & Basic Storage Elements 计算机科学与技术学院 School of Computer Science and Technology #### **Outline** - 1 Review - **2** Combinational Logic Circuits - **3 Basic Storage Elements** - 4 Summary #### **Outline** - 1 Review - **2** Combinational Logic Circuits - **Basic Storage Elements** - 4 Summary #### **Review** - **■**Transistor: Building Block of Computers - Logically, each transistor acts as a switch - **■**Combined to implement logic functions - AND, OR, NOT #### **N-type MOS Transistor** #### ■MOS = Metal Oxide Semiconductor • two types: N-type and P-type #### ■N-type when Gate has <u>positive</u> voltage, short circuit between #1 and #2 (switch <u>closed</u>) when Gate has zero voltage, open circuit between #1 and #2 (switch open) connected to GND (0V). #### P-type MOS Transistor #### ■P-type is *complementary* to N-type when Gate has positive voltage, open circuit between #1 and #2 (switch open) • when Gate has zero voltage, short circuit between #1 and #2 (switch closed) Gate = 1Terminal #1 must be +2.9V connected to +2.9V. #1 Gate Gate = 0 #### **Gates** 2025/2/24 Lecture 1 #### **Today: Building Functions from Logic Gates** - We've already seen how to implement truth tables using AND, OR, and NOT -- an example of combinational logic. - Combinational Logic Circuit - output depends only on the current inputs - stateless - Sequential Logic Circuit - output depends on the sequence of inputs (past and present) - stores information (state) from past inputs - We'll first look at some useful combinational circuits, then show how to use sequential circuits to store information. #### **Approach: Bottom Up** Now, You are Here. 2025/2/24 # **Great Idea #3: Abstraction Helps Us Manage Complexity** # **Great Idea #4: Software and Hardware Codesign** ### **Outline** - 1 Review - **2** Combinational Logic Circuits - **3** Basic Storage Elements - 4 Summary ### **How many transistors?** #### $\blacksquare n$ inputs, $2^n$ outputs • exactly one output is 1 for each possible input pattern ### Multiplexer (MUX) ### ■ n-bit selector and $2^n$ inputs, one output output equals one of the inputs, depending on selector 4-to-1 MUX #### **Full Adder** Add two bits and carry-in, produce one-bit sum and carry-out. | A | В | C <sub>in</sub> | s | C <sub>out</sub> | |---|---|-----------------|---|------------------| | 0 | 0 | 0 | 0 | 0 | | 0 | 0 | 1 | 1 | 0 | | 0 | 1 | 0 | 1 | 0 | | 0 | 1 | 1 | 0 | 1 | | 1 | 0 | 0 | 1 | 0 | | 1 | 0 | 1 | 0 | 1 | | 1 | 1 | 0 | 0 | 1 | | 1 | 1 | 1 | 1 | 1 | | | | | | | #### **Four-bit Adder** ### Adder/Subtractor - Approach #1 ### Adder/Subtracter - Approach #2 ### The Programmable Logic Array (PLA) The truth table for a one-bit adder. #### Incrementer #### ■ Let's create an incrementer - Input: A (as a 16-bit 2's complement integer) - Output: A+1 (also as a 16-bit 2's complement integer) #### ■ Approach #1 (impractical): - Use PLA-like techniques to implement circuit - Problem: 2<sup>16</sup> or 65536 rows, 16 output columns - In theory, possible; in practice, intractable #### ■ Approach#2 (pragmatic): - Create an 1-bit incrementer circut - Replicate it 16 times #### **One-bit Incrementer** ### ■ Implement a single-column of incrementer | Α | C <sub>in</sub> | s | C <sub>out</sub> | |---|-----------------|---|------------------| | 0 | 0 | 0 | 0 | | 0 | 1 | 1 | 0 | | 1 | 0 | 1 | 0 | | 1 | 1 | 0 | 1 | ### **Outline** - 1 Review - **2** Combinational Logic Circuits - 3 Basic Storage Elements - 4 Summary ### **Combinational vs. Sequential** #### **■**Combinational Circuit - always gives the same output for a given set of inputs - ex: adder always generates sum and carry, regardless of previous inputs #### **■**Sequential Circuit - stores information - output depends on stored information (state) plus input - so a given input might produce different outputs, depending on the stored information - example: ticket counter - advances when you push the button - output depends on previous state - useful for building "memory" elements and "state machines" ### **R-S Latch: Simple Storage Element** - R is used to "reset" or "clear" the element set it to zero. - S is used to "set" the element set it to one. - If both R and S are one, out could be <u>either</u> zero or one. - "quiescent" state -- holds its previous value - note: if a is 1, b is 0, and vice versa ### **Clearing the R-S latch** ■Suppose we start with output = 1, then change R to zero. Then set R=1 to "store" value in quiescent state. ## **Setting the R-S Latch** ■Suppose we start with output = 0, then change S to zero. Then set S=1 to "store" value in quiescent state. ## **R-S Latch Summary** $$R = S = 1$$ • hold current value in latch $$S = 1$$ and $R = > 0$ , • set value to 0 $$R=1$$ and $S=>0$ • set value to 1 $$R = S = 0$$ - both outputs equal one - final state determined by electrical properties of gates - Don't do it! #### **Gated D-Latch** ### ■Two inputs: D (data) and WE (write enable) - when WE = 1, latch is set to value of D - -S = NOT(D), R = D - when WE = 0, latch holds previous value - -S = R = 1 ### Register #### ■ A register stores a multi-bit value. - We use a collection of D-latches, all controlled by a common WE. - When WE=1, n-bit value D is written to register. ## **Representing Multi-bit Values** - Number bits from right (0) to left (n-1) - just a convention -- could be left to right, but must be consistent - Use brackets to denote range: D[l:r] denotes bit I to bit r, from left to right ■ May also see A<14:9>, especially in hardware block diagrams. ### **Memory** ■Now that we know how to store bits, we can build a memory – a logical $k \times m$ array of stored bits. 2<sup>2</sup> x 3 Memory # Reading location 3 in our 2<sup>2</sup>-by-3-bit memory. ### **More Memory Details** - This is a not the way actual memory is implemented. - fewer transistors, much more dense, relies on electrical properties - But the logical structure is very similar. - address decoder - word select line - word write enable - Two basic kinds of RAM (Random Access Memory) - Static RAM (SRAM) - fast, not very dense (bitcell is a latch) - Dynamic RAM (DRAM) - slower but denser, bit storage must be periodically refreshed - each bitcell is a capacitor (like a leaky bucket) that decays ### **SRAM Memory** ## **Outline** - 1 Review - **2** Combinational Logic Circuits - 3 Basic Storage Elements - 4 Summary ## **Basic Logical Structure** <sub>√</sub>37<sub>0</sub> # 计算机系统概论 Introduction to Computing Systems ( CS1002A.02 ) #### 计算机科学与技术学院 School of Computer Science and Technology ### **Outline** - 1 Review - **2** Sequential Logic Circuits - 3 From Logic to Data Path - 4 Summary ### **Outline** - 1 Review - **2** Sequential Logic Circuits - 3 From Logic to Data Path - 4 Summary #### **Review** #### ■ We' ve touched on basic digital logic - Transistors - Gates - Storage (latches, flip-flops, register, memory) #### **■** Built some simple logical circuits - adder, subtracter, adder/subtracter, Incrementer - Counter (consisting of register and incrementer) ## **Basic Logical Structure** ### **Today** ### ■ A computer as a (simple?) state machine - State machines - Hard-coded traffic sign state machine - Programmable traffic sign state machine ### **Approach: Bottom Up** Now, You are Here. # **Great Idea #3: Abstraction Helps Us Manage Complexity** # **Great Idea #4: Software and Hardware Codesign** ## **Outline** - 1 Review - **2** Sequential Logic Circuits - 3 From Logic to Data Path - 4 Summary #### **State Machine** #### Another type of sequential circuit - Combines combinational logic with storage - "Remembers" state, and changes output (and state) based on inputs and current state ## **Combinational vs. Sequential** #### Two types of "combination" locks #### **Combinational** Success depends only on the values, not the order in which they are set. #### **Sequential** Success depends on the sequence of values (e.g, R-13, L-22, R-3). #### **State** The state of a system is a snapshot of all the relevant elements of the system at the moment the snapshot is taken. #### **Examples:** - The state of a basketball game can be represented by the scoreboard. - -Number of points, time remaining, possession, etc. - The state of a tic-tac-toe game can be represented by the placement of X's and O's on the board. ## **State of Sequential Lock** ## Our lock example has four different states, labelled A-D: - A: The lock is not open, and no relevant operations have been performed. - B: The lock is not open, and the user has completed the R-13 operation. - C: The lock is not open, and the user has completed R-13, followed by L-22. - D: The lock is open. ## **State Diagram** ## Shows states and actions that cause a transition between states. #### **Finite State Machine** #### A description of a system with the following components: - 1. A finite number of states - 2. A finite number of external inputs - 3. A finite number of external outputs - 4. An explicit specification of all state transitions - 5. An explicit specification of what causes each external output value. #### Often described by a state diagram. - Inputs may cause state transitions. - Outputs are associated with each state (or with each transition). ## **Implementing a Finite State Machine** #### **Combinational logic** • Determine outputs and next state. #### Storage elements • Maintain state representation. #### The Clock Frequently, a clock circuit triggers transition from one state to the next. At the beginning of each clock cycle, state machine makes a transition, based on the current state and the external inputs. Not always required. In lock example, the input itself triggers a transition. ## **Storage: Master-Slave Flipflop** ## A pair of gated D-latches, to isolate *next* state from *current* state. During 1<sup>st</sup> phase (clock=1), previously-computed state becomes *current* state and is sent to the logic circuit. During 2<sup>nd</sup> phase (clock=0), *next* state, computed by logic circuit, is stored in Latch A. ## **Storage** Each master-slave flipflop stores one state bit. The number of storage elements (flipflops) needed is determined by the number of states (and the representation of each state). #### **Examples:** - Sequential lock - -Four states two bits - Basketball scoreboard - -7 bits for each score, 5 bits for minutes, 6 bits for seconds, - 1 bit for possession arrow, 1 bit for half, ... ## **Complete Example** #### A blinking traffic sign - No lights on - 1 & 2 on - 1, 2, 3, & 4 on - 1, 2, 3, 4, & 5 on - (repeat as long as switch is turned on) ## **Traffic Sign State Diagram** Transition on each clock cycle. 2025/2/24 Transition on each clock cycle. Transition on each clock cycle. Transition on each clock cycle. 2025/2/24 Transition on each clock cycle. Transition on each clock cycle. ## **Traffic Sign Truth Tables** Outputs (depend only on state: $S_1S_0$ ) Next State: S<sub>1</sub>'S<sub>0</sub>' (depend on state and input) | | | | | Lights 1 and 2 Lights 3 and 4 Light 5 | | |----------------|----------------|---|---|---------------------------------------|----| | S <sub>1</sub> | S <sub>0</sub> | Z | Υ | <u> </u> | | | 0 | 0 | 0 | 0 | 0 | | | 0 | 1 | 1 | 0 | 0 | | | 1 | 0 | 1 | 1 | 0 | | | 1 | 1 | 1 | 1 | <b>1</b> Wh | er | | ┌; | Switch | | | | | | | | |----------|----------------|----------------|------------------|------------------|--|--|--|--| | ln | S <sub>1</sub> | S <sub>0</sub> | S <sub>1</sub> ' | S <sub>0</sub> ' | | | | | | <b>0</b> | X | X | 0 | 0 | | | | | | 1 | 0 | 0 | 0 | 1 | | | | | | 1 | 0 | 1 | 1 | 0 | | | | | | 1 | 1 | 0 | 1 | 1 | | | | | | 1 | 1 | 1 | 0 | 0 | | | | | | | | | | | | | | | Whenever In=0, next state is 00. ## **Traffic Sign Logic** ## **Outline** - 1 Review - **2** Sequential Logic Circuits - 3 From Logic to Data Path - 4 Summary ### From Logic to Data Path ## The data path of a computer is all the logic used to process information. • See the data path of the LC-3 on next slide. #### **Combinational Logic** - Decoders -- convert instructions into control signals - Multiplexers -- select inputs and outputs - ALU (Arithmetic and Logic Unit) -- operations on data #### **Sequential Logic** - State machine -- coordinate control signals and data movement - Registers and latches -- storage elements #### LC-3 Data Path Overview (Microarchitecture) ## **Outline** - 1 Review - **2** Sequential Logic Circuits - 3 From Logic to Data Path - 4 Summary ## **Summary** - ■MOS transistors are used as switches to implement logic functions. - N-type: connect to GND, turn on (with 1) to pull down to 0 - P-type: connect to +2.9V, turn on (with 0) to pull up to 1 - ■Basic gates: NOT, NOR, NAND - Logic functions are usually expressed with AND, OR, and NOT - **■**Properties of logic gates - Completeness - -can implement any truth table with AND, OR, NOT - DeMorgan's Law - -convert AND to OR by inverting inputs and output ### **Summary** - We' ve touched on basic digital logic - Transistors - Gates - Storage (latches, flip-flops, memory) - State machines - Built some simple circuits - adder, subtracter, adder/subtracter, Incrementer - Counter (consisting of register and incrementer) - Hard-coded traffic sign state machine - Programmable traffic sign state machine - Up next: a computer as a (simple?) state machine #### **LC-3 Data Path** #### **Next Time** - **■** Topic - The von Neumann Model - **■** Readings - Chapter 4.0 4.2 2025/2/24 #### 计算机系统概论 ta Computing Systems Introduction to Computing Systems (CS1002A.02) #### 计算机科学与技术学院 School of Computer Science and Technology ## **Outline** - 1 Review - **2** From ENIAC to the Stored Program Computer - 3 A Machine Structure: von Neumann Model - 4 Summary ## **Outline** - 1 Review - **2** From ENIAC to the Stored Program Computer - 3 A Machine Structure: von Neumann Model - 4 Summary #### **Review** - ■MOS transistors are used as switches to implement logic functions. - N-type: connect to GND, turn on (with 1) to pull down to 0 - P-type: connect to +2.9V, turn on (with 0) to pull up to 1 - ■Basic gates: NOT, NOR, NAND - Logic functions are usually expressed with AND, OR, and NOT - **■**Properties of logic gates - Completeness - -can implement any truth table with AND, OR, NOT - DeMorgan's Law - -convert AND to OR by inverting inputs and output #### **Review** #### ■ We' ve touched on basic digital logic - Transistors - Gates - Storage (latches, flip-flops, memory) - State machines #### ■ Built some simple circuits - adder, subtracter, adder/subtracter, Incrementer - Counter (consisting of register and incrementer) - Hard-coded traffic sign state machine - Programmable traffic sign state machine - Up next: a computer as a state machine #### **Bottom up approach** #### **Today** - Great Idea #2: Stored program computer(Von Neumann Model--A Machine Structure - Basic Components for a machine - The LC-3: An Example of von Neumann Machine - Instruction Processing # **Bottom up approach** # **Great Idea #3: Abstraction Helps Us Manage Complexity** # **Great Idea #4: Software and Hardware Codesign** # **Outline** - 1 Review - **2** From ENIAC to the Stored Program Computer - 3 A Machine Structure: von Neumann Model - 4 Summary # **The First Electronic Computers** # ENIAC - The first electronic computer ,1946年 Programmed by plugboard and switches, time consuming! ## ENIAC - The first electronic computer ,1946年 Changing the program could take days! ### The Origin of the Stored Program Computer John von Neumann. c. 1955 **Credit: Computer History Museum** #### 1946: ENIAC - Presper Eckert and John Mauchly -- first general electronic computer. - Hard-wired program -- settings of dials and switches. #### 1944: Beginnings of EDVAC(Electronic Discrete Variable Automatic Computer) • John von Neumann joined ENIAC team and proposed a stored program computer called EDVAC #### 1945: John von Neumann • John von Neumann wrote "First Draft of a Report on the EDVAC" in which he outlined the architecture of a stored-program computer. #### The basic structure proposed in the draft became known as the "von Neumann machine" (or model). - a memory, containing instructions and data - a processing unit, for performing arithmetic and logical operations - a control unit, for interpreting instructions # The Stored Program Computer Architecture (von Neumann Machine Architecture or Model) # **The Stored Program Computer** EDSAC University of Cambridge UK, 1949 Maurice Vincent Wilkes Electronic storage of programming information and data eliminated the need for the more clumsy methods of programming, such as punched paper tape — a concept that has characterized mainstream computer development since 1945. # Two major inventions of the microprocessor chip Stored program + Transistor technology Change the program so that you can do all kinds of tasks on the same hardware The device is smaller and faster than a vacuum tube # **Outline** - 1 Review - **2** From ENIAC to the Stored Program Computer - 3 A Machine Structure: von Neumann Model - 4 Summary #### von Neumann Model # LC-3 Data Path **Processing Control Unit** Unit **Memory** INPUT OUTPUT Unit 21 ## **Memory** ### $k \times m$ array of stored bits (k is usually $2^n$ ) #### **Address** unique (n-bit) identifier of location #### **Contents** • *m*-bit value stored in location | 0000 | | |------|----------| | 0001 | | | 0010 | | | 0011 | 00101101 | | 0100 | | | 0101 | | | 0110 | | | | : | | | | | 1101 | 10100010 | | 1110 | | | 1111 | | #### **Basic Operations:** LOAD • read a value from a memory location #### **STORE** • write a value to a memory location ## **Interface to Memory** ### How does processing unit get data to/from memory? MAR: Memory Address Register MDR: Memory Data Register #### To read a location (A): - 1. Write the address (A) into the MAR. - 2. Send a "read" signal to the memory. - 3. Read the data from MDR. #### To write a value (X) to a location (A): - 1. Write the data (X) to the MDR. - 2. Write the address (A) into the MAR. - 3. Send a "write" signal to the memory. ## **Processing Unit** #### **Functional Units** - ALU = Arithmetic and Logic Unit - could have many functional units. some of them special-purpose (multiply, square root, ...) - LC-3 performs ADD, AND, NOT #### Registers - Small, temporary storage - Operands and results of functional units - LC-3 has eight register (R0, ..., R7) #### **Word Size** - number of bits normally processed by ALU in one instruction - also width of registers - ●LC-3 is 16 bits ## **Input and Output** - Devices for getting data into and out of computer memory - Each device has its own interface. usually a set of registers like the memory's MAR and MDR INPUT Kevboard Mouse Scanner Disk OUTPUT Monitor Printer LED Disk - LC-3 supports keyboard (input) and console (output) - keyboard: data register (KBDR) and status register (KBSR) - console: data register (CRTDR) and status register (CRTSR) - frame buffer: memory-mapped pixels - Some devices provide both input and output - •disk, network - Program that controls access to a device is usually called a driver. #### **Control Unit** Orchestrates execution of the program - Instruction Register (IR) contains the <u>current</u> instruction. - Program Counter (PC) contains the <u>address</u> of the next instruction to be executed. - **■** Control unit: - reads an instruction from memory - the instruction's address is in the PC - interprets the instruction, generating signals that tell the other components what to do - an instruction may take many machine cycles to complete ## **Instruction Processing (State Transtion)** #### Instruction - The instruction is the fundamental unit of work. - Specifies two things: - opcode: operation to be performed - operands: data/locations to be used for operation - An instruction is encoded as a <u>sequence of bits</u>. (Just like data!) - Often, but not always, instructions have a fixed length, such as 16 or 32 bits. - Control unit interprets instruction: generates sequence of control signals to carry out operation. - Operation is either executed completely, or not at all. - A computer's instructions and their formats is known as its Instruction Set Architecture (ISA). - <u>Persistent</u> ISA invented by UW grad Gene Amdahl (IBM 360) ## **Example: LC-3 ADD Instruction** #### LC-3 has 16-bit instructions. Each instruction has a four-bit opcode, bits [15:12]. ### LC-3 has eight *registers* (R0-R7) for temporary storage. • Sources and destination of ADD are registers. | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | |----|-----|----|----|----|----------------|---|---|---|---|---|---|---|---|----|---| | | ADD | | | | Dst | _ | | | | 0 | 0 | 0 | S | rc | 2 | | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | 0 | 0 | 0 | 1 | 1 | 10<br><b>1</b> | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | "Add the contents of R2 to the contents of R6. and store the result in R6 " ## **Example: LC-3 LDR Instruction** ### Load instruction -- reads data from memory #### Base + offset mode: - add offset to base register -- result is memory address - load from memory address into destination register "Add the value 6 to the contents of R3 to form a memory address. Load the contents stored in that address to R2." # **Instruction Processing: FETCH** - Load next instruction (at address stored in PC) from memory into Instruction Register (IR). - Load contents of PC into MAR. - Send "read" signal to memory. - Read contents of MDR, store in IR. - Then increment PC, so that it points to the next instruction in sequence. - PC becomes PC+1. | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | |----|----|--------|----|----|----|------|---|---|---|---|---|---|----|---|---| | | ΑI | DD Dst | | | - | Src1 | | | 0 | 0 | 0 | S | rc | 2 | | | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | | | | | | | | | | | | 0 | | | | | # **Instruction Processing: DECODE** - First identify the opcode. - In LC-3, this is always the first four bits of instruction. - A 4-to-16 decoder asserts a control line corresponding to the desired opcode. - Depending on opcode, identify other operands from the remaining bits. - Example: - for ADD, last three bits is source operand #2 - for LDR, last six bits is offset | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | |----|---------|----|----|----|------|---|---|---|---|---|---|----|---|---|---| | | ADD Dst | | | | Src1 | | | 0 | 0 | 0 | S | rc | 2 | | | | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | # **Instruction Processing: EVALUATE ADDRESS** ■ For instructions that require memory access, compute address used for access. #### **■** Examples: - add offset to base register (as in LDR) - •add offset to PC (or to part of PC) - add offset to zero # **Instruction Processing: FETCH OPERANDS** - Obtain source operands needed to perform operation. - **■** Examples: - read data from register file (ADD) - load data from memory (LDR) # **Instruction Processing: EXECUTE** Perform the operation, using the source operands. #### **Examples:** - send operands to ALU and assert ADD signal - do nothing (e.g., for loads and stores) ## **Instruction Processing: STORE** ### Write results to destination. (register or memory) #### **■** Examples: - result of ADD is placed in destination register - result of memory load is placed in destination register - for store instruction, data is stored to memory - write address to MAR, data to MDR - assert WRITE signal to memory | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|----|----|----|----|----------|---|---|---|---|---|---|---|------|---|---|--| | | ΑI | DD | | I | Dst Srci | | | | 1 | 0 | 0 | 0 | Src2 | | | | | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | | # **Changing the Sequence of Instructions** - In the FETCH phase, we incremented the Program Counter by 1. - What if we don't want to always execute the instruction that follows this one? - •examples: loop, if-then-else, function call - Need special instructions that change the contents of the PC. - These are called *jumps* and *branches*. - jumps are unconditional -- they always change the PC - •branches are conditional -- they change the PC only if some condition is true (e.g., the contents of a register is zero) ## **Example: LC-3 JMP Instruction** ■ Set the PC to the value contained in a register. This becomes the address of the next instruction to fetch. "Load the contents of R3 into the PC." ## **Driving Force: The Clock** #### The clock is a signal that keeps the control unit moving. • At each clock "tick," control unit moves to the next machine cycle -- may be next instruction or next phase of current instruction. #### Clock generator circuit: - Based on crystal oscillator - Generates regular sequence of "0" and "1" logic 167618 - Clock cycle (or machine cycle) -- rising edge to rising edge ## **Control Unit State Diagram** ■ The control unit is a state machine. Here is part of a simplified state diagram for the LC-3: ## **Stopping the Clock** # Control unit will repeat instruction processing sequence as long as clock is running. - If not processing instructions from your application, then it is processing instructions from the Operating System (OS). - The OS is a special program that manages processor and other resources. #### To stop the computer: - AND the clock generator signal with ZERO - when control unit stops seeing the CLOCK signal, it stops processing # **Outline** - 1 Review - **2** From ENIAC to the Stored Program Computer - 3 A Machine Structure: von Neumann Model - 4 Summary #### **Von Neumann Model** ## Instruction Processing Summary - Instructions look just like data -- it's all interpretation. - Three basic kinds of instructions: - computational instructions (ADD, AND, ...) - data movement instructions (LD, ST, ...) - control instructions (JMP, BRnz, ...) - Six basic phases of instruction processing: - not all phases are needed by every instruction - phases may take variable number of machine cycles #### LC-3 Data Path #### 计算机系统概论 ta Computing Systems Introduction to Computing Systems (CS1002A.02) #### 计算机科学与技术学院 School of Computer Science and Technology # **Outline** - 1 Review - 2 LC-3 ISA Overview - 3 LC-3 Operate Instructions and Data Path - 4 Summary # **Outline** - 1 Review - 2 LC-3 ISA Overview - 3 LC-3 Operate Instructions and Data Path - 4 Summary # **Approach: Bottom Up** # **Approach: Bottom Up** Now, You are Here. # **Great Idea #2 Von Neumann Structure** (**Architecture Model**) # **Great Idea #1 Turing Machine** (Computational Model) # Great Idea #1 Turing Machine (Computataional Model) ### **Instruction Processing: State Transtion** # **Instruction Processing: Finite State Automata** ### **Today** #### ■ We are going to learn how to: - compute with values in registers - load data from memory to registers - store data from registers to memory ## How do we get the electrons to do the work? # **Great Idea #4: Software and Hardware Codesign** # **Great Idea #3: Abstraction Helps Us Manage Complexity** # **Outline** - 1 Review - 2 LC-3 ISA Overview - 3 LC-3 Operate Instructions and Data Path - 4 Summary #### Instruction Set Architecture #### ISA = All of the *programmer-visible* components and operations of the computer - memory organization - address space -- how may locations can be addressed? - addressability how many bits per location? - register set - how many? what size? how are they used? - instruction set - opcodes - data types - addressing modes ISA provides all information needed for someone that wants to write a program in machine language (or translate from a highlevel language to machine language). # LC-3 Overview: Memory and Registers #### Memory - address space: 2<sup>16</sup> locations(16-bit addresses) - addressability: 16 bits #### Registers - temporary storage, accessed in a single machine cycle - -accessing memory generally takes longer than a single cycle - eight general-purpose registers: R0 R7 - -each 16 bits wide - -how many bits to uniquely identify a register? - other registers - not directly addressable, but used by (and affected by) instructions - -PC (program counter), condition codes # LC-3 Overview: Memory Map 1800 #### LC-3 Overview: Instruction Set #### **Opcodes** - 15 opcodes - Operate instructions: ADD, AND, NOT - Data movement instructions: LD, LDI, LDR, LEA, ST, STR, STI - Control instructions: BR, JSR/JSRR, JMP(RET), RTI, TRAP - some opcodes(ADD, AND, NOT; LD, LDI, LDR, LEA) set/clear condition codes, based on result: - -N = negative, Z = zero, P = positive (> 0) #### **Data Types** 16-bit 2's complement integer #### **Addressing Modes** - How is the location of an operand specified? - non-memory addresses: immediate, register - memory addresses: PC-relative, indirect, base+offset #### LC-3 ISA Overview | 数据移动指令<br>(Data Movement Instructions) | | | | | | | | | | | | | | | |----------------------------------------|---------------------------------------|----------------------|---|---|----|-----------------|----------|--|--|--|--|--|--|--| | 取数指令(Load) | | | | | | | | | | | | | | | | 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 | | | | | | | | | | | | | | | | LD | 0 | 0 0 1 0 DR PCoffset9 | | | | | | | | | | | | | | LDR | 0 | | 1 | 0 | DR | BaseR PCoffset6 | | | | | | | | | | LDI | 1 | 0 | 1 | 0 | DR | PCoffset9 | | | | | | | | | | LEA | 1 | | 1 | 0 | DR | PCoffset9 | | | | | | | | | | 存数指令(Store) | | | | | | | | | | | | | | | | | 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 | | | | | | | | | | | | | | | ST | 0 | 0 | 1 | 1 | SR | F | Coffset9 | | | | | | | | | STR | 0 | 1 | 1 | 1 | SR | BaseR PCoffset6 | | | | | | | | | | STI | 1 | 0 | 1 | 1 | SR | PCoffset9 | | | | | | | | | 2025/2/24 # **Outline** - 1 Review - 2 LC-3 ISA Overview - 3 LC-3 Operate Instructions and Data Path - 4 Summary # **Operate Instructions** #### Only three operations: ADD, AND, NOT #### Source and destination operands are registers - These instructions <u>do not</u> reference memory. - ADD and AND can use "immediate" mode, where one operand is hard-wired into the instruction. #### Will show dataflow diagram with each instruction. • illustrates <u>when</u> and <u>where</u> data moves to accomplish the desired operation ## **LC-3 ISA Operate Instructions** # **Operate Instructions Overview** | | 15 | 14 | 13 | 12 | 11 10 9 | 8 7 6 | 5 | 4 | 3 | 2 | 1 | 0 | |----------|----|----|----|----|---------|-------|---|------|-------|-----|---|---| | ADD | 0 | 0 | 0 | 1 | DR | SR1 | 0 | 0 | 0 | SR2 | | | | ADD | 0 | 0 | 0 | 1 | DR | SR1 | 1 | lmm5 | | | | | | AND | 0 | 1 | 0 | 1 | DR | SR1 | 0 | 0 | 0 SR2 | | | | | AND | 0 | 1 | 0 | 1 | DR | SR1 1 | | lmm5 | | | | | | NOT | 1 | 0 | 0 | 1 | DR | SR1 | 1 | 1 | 1 | 1 | 1 | 1 | | Reserved | 1 | 1 | 0 | 1 | | | | | | | | | 2025/2/24 # LC-3 Data Path ## **NOT** (Register) Note: Src and Dst could be the <u>same</u> register. # → EA **NOT (Register):** GatePC — 16 ←□ → ←□ → ← □ → # → EA **NOT (Register):** LD.PC-**PCMUX** LD.MDR — MDR < LD.MAR **MEMORY** MIO.EN-> MUX - MEM.EN.R.W 4□ > 4□ > 4 ≡ > 4 # **NOT (Register):** LD.IR-16 GateMDR — ▷△ LD.MDR — MDR ←□ → ←□ → ← □ → # D - EA **NOT (Register):** FINITE NOT ALU STATE MACHINE GateALU RUN ←□ → ←□ → ← □ → # EA . **NOT (Register):** REG FILE SR2 OUT SR1 OUT 3 SR1 FINITE NOT ALU STATE MACHINE GateALU RUN ←□ → ←□ → ← □ → # OP . **NOT (Register):** REG FILE SR2 OUT SR1 OUT FINITE NOT ALU STATE MACHINE GateALU RUN ←□ → ←□ → ← □ → # **NOT (Register):** REG FILE SR2 OUT SR1 OUT LD.CC N Z P **FINITE** NOT ALU STATE MACHINE LOGIC GateALU RUN ←□ → ←□ → ← □ → ## NOT (Register): NOT R3, R5 | | | | | | | | | | | | | | | | 0 | |-----|---|---|---|---|---|-----|---|----|-----|---|---|---|---|---|---| | NOT | 1 | 0 | 0 | 1 | 1 | Dst | : | 92 | Src | 1 | 1 | 1 | 1 | 1 | 1 | # ADD/AND (Register) this zero means "register mode" | | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2_ | 1 | 0_ | |-----|----|-----|-----|----|-----|-----|---|------|---|---|---|---|------|----|------|----| | ADD | 0 | 0 | 0 | 1 | Dst | | | Src1 | | | 0 | 0 | 0 | S | Src2 | | | | 15 | 1.4 | 1.3 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | Д | 3 | 2 | 1 | 0 | | AND | 0 | 1 | 0 | 1 | I | Dst | | Src1 | | 0 | 0 | 0 | Src2 | | 2 | | # **ADD/AND (Register)** 16 ←□ → ←□ → ← □ → # **ADD/AND (Register)** LD.PC-**PCMUX** LD.MDR — MDR MAR **MEMORY** MIO.EN-> MUX - MEM.EN.R.W 4□ > 4□ > 4 ≡ > 4 # **ADD/AND (Register)** LD.IR-716 GateMDR — ▷△ LD.MDR — MDR ←□ → ←□ → ← □ → # AND (Register): AND R3, R5, R1 #### Register File # **ADD/AND (Immediate)** this one means "immediate mode" # ADD (Immediate) ADD R1, R5, #-2 # **Using Operate Instructions** #### With only ADD, AND, NOT... • How do we subtract? • How do we OR? • How do we copy from one register to another? • How do we initialize a register to zero? # **Outline** - 1 Review - 2 LC-3 ISA Overview - 3 LC-3 Operate Instructions and Data Path - 4 Summary #### **Great Idea #4: Software and Hardware Co-design** # How do we get the electrons to do the work? # **Operate Instructions** | | | 14 | 13 | 12 | 11 10 9 | 8 7 6 | 5 | 4 | 3 | 2 | 1 | 0 | |----------|---|----|----|----|---------|-------|---|------|------|-----|-----|----------| | ADD | 0 | 0 | 0 | 1 | DR | SR1 | 0 | 0 | 0 | · · | SR2 | <u>-</u> | | ADD | | 0 | 0 | 1 | DR | SR1 | 1 | | lmm5 | | | | | AND | | 1 | 0 | 1 | DR | SR1 | 0 | 0 | 0 | , | SR2 | 2 | | AND | | 1 | 0 | 1 | DR | SR1 | 1 | lmm5 | | | | | | NOT | | 0 | 0 | 1 | DR | SR1 | 1 | 1 | 1 | 1 | 1 | 1 | | Reserved | 1 | 1 | 0 | 1 | | | | | | | | | 2025/2/24 # LC-3 Data Path **Processing Control Unit** Unit **Memory** Unit # LC-3 Data Path After Operate Instruction #### **Next Lecture: Data Movement Instructions** SR STI PCoffset9 # Reading ■ 5.3 Data Movement Instructions # 计算机系统概论 Introduction to Computing Systems (CS1002A.02) 计算机科学与技术学院 School of Computer Science and Technology # **Outline** - 1 Review - **2** LC-3 PC-Relative Load/Store - 3 LC-3 Indirect, Base+offset Load/Store - 4 Summary # **Outline** - 1 Review - **2** LC-3 PC-Relative Load/Store - 3 LC-3 Indirect, Base+offset Load/Store - 4 Summary #### **Great Idea #3: Abstraction** # **LC-3 ISA Operate Instructions** # **LC-3 Data Path After Operate Instruction** # **Today** - We are going to learn how to: - load data from memory to registers - store data from registers to memory ### **Outline** - 1 Review - **2** LC-3 PC-Relative Load/Store - 3 LC-3 Indirect, Base+offset Load/Store - 4 Summary # **Today** #### ■ We are going to learn how to: - compute with values in registers - load data from memory to registers - store data from registers to memory # LC-3 Overview: Memory Map #### **Data Movement Instructions** #### 存数指令 | | 15 14 | <u> 13</u> | 12 | 11 10 9 | 8 7 6 | 5 4 3 2 1 0 | | | | |-----|-------|------------|----|--------------|-----------|-------------|--|--|--| | ST | 0 0 | 1 | 1 | SR PCoffset9 | | | | | | | STR | 0 1 | 1 | 1 | SR | BaseR | PCoffset6 | | | | | STI | 1 0 | 1 | 1 | SR | PCoffset9 | | | | | #### Data Movement Instructions #### **Load -- read data from memory to register** LD: PC-relative mode LDR: base+offset mode LDI: indirect mode #### **Store -- write data from register to memory** ST: PC-relative mode STR: base+offset mode STI: indirect mode #### Load effective address -- compute address, save in register LEA: immediate mode does not access memory # **PC-Relative Addressing Mode** #### Want to specify address directly in the instruction - But an address is 16 bits, and so is an instruction! - After subtracting 4 bits for opcode and 3 bits for register, we have 9 bits available for address. #### **Solution:** Use the 9 bits as a signed offset from the current PC. **9 bits:** -256 < offset < +255 Can form any address X, such that: $PC - 256 \le X \le PC + 255$ Remember that PC is incremented as part of the FETCH phase; This is done before the EVALUATE ADDRESS stage. # LD (PC-Relative) LD DR, PCoffset9 # LD (PC-Relative) : LD R1, x1AF 15 00 # D A EA OP LD (PC-Relative) 16 ←□ → ←□ → ← □ → ### $D \longrightarrow EA \longrightarrow OP$ **■** EX LD (PC-Relative) LD.PC-**PCMUX** MAR - LD.MAR LD.MDR — MDR 16 **MEMORY** MIO.EN-> MUX - MEM.EN.R.W ◆□▶ ◆圖▶ ◆臺▶ ◆臺▶ ## LD (PC-Relative) LD.IR-16 GateMDR — ▷♠ LD.MDR — MDR ←□ → ←□ → ← □ → ### LD (PC-Relative) GateMARMUX ----MARMUX ADDR2MUX —> - ADDR1MUX [8:0] **SEXT** FINITE STATE MACHINE RUN 16 <--- LD.MAR 4日 > 4日 > 4 로 > 4 로 > ### D - EA - OP - EX LD (PC-Relative) LD.MDR — MDR MAR 16 **MEMORY** MIO.EN-> MUX - MEM.EN.R.W イロト イ御ト イミト イミト #### ST (PC-Relative) ST SR, PCoffset9 # → EA → OP ST (PC-Relative) 16 ←□ → ←□ → ← □ → ### → EA ST (PC-Relative) LD.PC ──▷ **PCMUX** LD.MDR — MDR MAR 16 **MEMORY** MIO.EN-> MUX - MEM.EN.R.W イロト イ御ト イミト イミト ## ST (PC-Relative) LD.IR-16 GateMDR — ▷△ LD.MDR — MDR ←□ → ←□ → ← □ → ### ST (PC-Relative) GateMARMUX ----MARMUX ADDR2MUX —> - ADDR1MUX [8:0] **SEXT** FINITE STATE MACHINE LD.IR-RUN 16 <--- LD.MAR イロト イ御ト イミト イミト ## D A EA OP ST (PC-Relative) LD.MDR — MDR MAR 16 **MEMORY** - MEM.EN.R.W イロト イ御ト イミト イミト #### **Outline** - 1 Review - **2** LC-3 PC-Relative Load/Store - 3 LC-3 Indirect, Base+offset Load/Store - 4 Summary ### **Indirect Addressing Mode** ### With PC-relative mode, can only address data within 256 words of the instruction. • What about the rest of memory? #### Solution #1: Read address from memory location, then load/store to that address. First address is generated from PC and IR (just like PC-relative addressing), then content of that address is used as target for load/store. ### LDI (Indirect) LDI DR, PCoffset9 #### LDI (Indirect): LDI R1, x1AF 2025/2/24 MDR 32 0 # D A EA OP EX LDI (Indirect) 16 ←□ → ←□ → ← □ → ### D A EA OP EX LDI (Indirect) LD.PC-**PCMUX** LD.MDR — MDR MAR <--- LD.MAR 16 **MEMORY** MIO.EN-> MUX - MEM.EN.R.W ◆□▶ ◆圖▶ ◆臺▶ ◆臺▶ ## D A EA OP LDI (Indirect) LD.IR-16 GateMDR — ▷♠ LD.MDR — MDR ←□ → ←□ → ← □ → ### $D \longrightarrow EA \longrightarrow OP$ LDI (Indirect) GateMARMUX ----MARMUX ADDR2MUX —> - ADDR1MUX [8:0] SEXT FINITE STATE MACHINE LD.IR-RUN 16 <--- LD.MAR 4日 > 4日 > 4 로 > 4 로 > ### → OP D → EA LDI (Indirect) LD.MDR — MDR MAR <--- LD.MAR 16 **MEMORY** MIO.EN-> MUX - MEM.EN.R.W イロト イ御ト イミト イミト ## $\mathsf{D} \longrightarrow \mathsf{EA} \longrightarrow \mathsf{OP}$ LDI (Indirect) GateMDR — ▷△ 16 LD.MDR — MDR <>─ LD.MAR **MEMORY** イロト イ御ト イミト イミト ### → OP D → EA LDI (Indirect) LD.MDR — MDR MAR <--- LD.MAR 16 **MEMORY** MIO.EN-> MUX - MEM.EN.R.W イロト イ御ト イミト イミト ### STI (Indirect) STI SR, PCoffset9 # D A EA OP EX STI (Indirect) 16 ←□ → ←□ → ← □ → ### D A EA OP EX STI (Indirect) LD.PC-**PCMUX** LD.MDR — MDR MAR <--- LD.MAR 16 **MEMORY** MIO.EN-> MUX - MEM.EN.R.W ◆□▶ ◆圖▶ ◆臺▶ ◆臺▶ ## D A EA OP STI (Indirect) LD.IR-16 GateMDR — ▷△ LD.MDR — MDR ←□ → ←□ → ← □ → ### D - EA - OP - EX **STI (Indirect)** LD.MDR — MDR MAR <--- LD.MAR 16 **MEMORY** MIO.EN-> MUX - MEM.EN.R.W ◆□▶ ◆圖▶ ◆臺▶ ◆臺▶ ## $\mathsf{D} \longrightarrow \mathsf{EA} \longrightarrow \mathsf{OP}$ STI (Indirect) GateMDR — ▷△ 16 LD.MDR — MDR イロト イ御ト イミト イミト ## **►** EA OP EX **STI (Indirect)** MAR - LD.MAR LD.MDR — MDR 16 **MEMORY** - MEM.EN.R.W イロト イ御ト イミト イミト ### **Base + Offset Addressing Mode** ### With PC-relative mode, can only address data within 256 words of the instruction. • What about the rest of memory? #### Solution #2: • Use a register to generate a full 16-bit address. 4 bits for opcode, 3 for src/dest register, 3 bits for *base* register -- remaining 6 bits are used as a *signed offset*. • Offset is sign-extended before adding to base register. #### LDR (Base+Offset) LDR DR, BaseR, offset6 #### LDR (Base+Offset) : LD R1, R3, x1D MDR ←□ → ←□ → ← □ → 52 0 # LDR (Base+Offset) 16 ←□ → ←□ → ← □ → ### LDR (Base+Offset) LD.PC ──▷ **PCMUX** MAR - LD.MAR LD.MDR — MDR **MEMORY** MIO.EN-> MUX - MEM.EN.R.W イロト イ御ト イミト イミト ## LDR (Base+Offset) LD.IR-16 GateMDR — ▷△ LD.MDR — MDR ←□ → ←□ → ← □ → ### $D \longrightarrow EA \longrightarrow OP$ LDR (Base+Offset) MAR - LD.MAR LD.MDR — MDR 16 **MEMORY** MIO.EN-> MUX - MEM.EN.R.W イロト イ御ト イミト イミト #### STR (Base+Offset) STR SR, BaseR, offset6 # STR (Base+Offset) 16 ←□ → ←□ → ← □ → ### STR (Base+Offset) LD.PC ──▷ **PCMUX** MAR - LD.MAR LD.MDR — MDR 16 **MEMORY** MIO.EN-> MUX - MEM.EN.R.W イロト イ御ト イミト イミト ## STR (Base+Offset) MAR - LD.MAR MDR 16 **MEMORY** - MEM.EN.R.W イロト イ御ト イミト イミト #### **Load Effective Address** Computes address like PC-relative (PC plus signed offset) and stores the result into a register. Note: The *address* is stored in the register, not the contents of the memory location. #### LEA (Immediate) LD DR, PCoffset9 #### LEA (Immediate): LEA R1, x1AF ## D EA **LEA (Immediate)** 16 ←□ → ←□ → ← □ → ### **LEA (Immediate)** LD.PC ──▷ **PCMUX** MAR - LD.MAR LD.MDR — MDR 16 **MEMORY** MIO.EN-> MUX - MEM.EN.R.W ◆□▶ ◆圖▶ ◆臺▶ ◆臺▶ ## **LEA (Immediate)** LD.IR-16 GateMDR — ▷♠ LD.MDR — MDR ←□ → ←□ → ← □ → #### **Example** | Address | Instruction | Comments | |------------------------|-------------------------------------|---------------------------------------------------------------| | <b>x</b> 30 <b>F</b> 6 | 1 1 1 0 0 0 1 1 1 1 1 1 1 0 1 | $R1 \leftarrow PC - 3 = x30F4$ | | <b>x</b> 30 <b>F</b> 7 | 0 0 0 1 0 1 0 0 0 1 1 0 1 1 0 | $R2 \leftarrow R1 + 14 = x3102$ | | <b>x</b> 30 <b>F</b> 8 | 0 0 1 1 0 1 0 1 1 1 1 1 1 0 1 1 | M[PC - 5] ← $R2$ ; i.e. $M[x30F4]$ ← $x3102$ | | <b>x</b> 30 <b>F</b> 9 | 0 1 0 1 0 1 0 1 0 0 1 0 1 0 0 0 0 0 | R2 ← 0 | | x30FA | 0 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 | $R2 \leftarrow R2 + 5 = 5$ | | x30FB | 0 1 1 1 0 1 0 0 0 1 0 0 1 1 0 | $M[R1+14] \leftarrow R2$ ; i.e. $M[x3102] \leftarrow 5$ | | x30FC | 1 0 1 0 0 1 1 1 1 1 1 1 0 1 1 1 | $R3 \leftarrow M[M[PC-9]]$ $= M[M[x30F4]]$ $= M[x3102]$ $= 5$ | opcode #### **Outline** - 1 Review - **2** LC-3 PC-Relative Load/Store - 3 LC-3 Indirect, Base+offset Load/Store - 4 Summary #### **Today: Data Movement Instructions** #### 取数指令 15 14 13 12 11 10 LD PCoffset9 0 0 DR **LDR** 0 DR **BaseR** PCoffset6 LDI PCoffset9 0 DR LEA PCoffset9 0 DR #### 存数指令 | | 15 1 | 4 1 | 3 1 | 12 | 11 10 9 | 8 7 6 | 5 4 3 2 1 0 | | | |-----|------|-----|-----|----|---------|-----------------|-------------|--|--| | ST | 0 ( | 0 ′ | 1 | 1 | SR | | PCoffset9 | | | | STR | 0 | 1 ' | 1 | 1 | SR | BaseR PCoffset6 | | | | | STI | 1 ( | 0 ' | 1 | 1 | SR | PCoffset9 | | | | #### LC-3 Data Path After Operate Instruction #### LC-3 Data Path After Load/Store Instruction #### **Next Lecture: Control Instructions** | | | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | |--|------|--------------------------------|----|----|----|----|----|---|---|-----|----|------|------|-----|---|---|---| | | BR | 0 | 0 | 0 | 0 | n | Z | р | | | | РС | offs | et9 | | | | | | JSR | 0 | 1 | 0 | 0 | 1 | | | | | PC | offs | et11 | | | | | | | JSRR | 0 | 1 | 0 | 0 | 0 | 0 | 0 | В | ase | R | 0 | 0 | 0 | 0 | 0 | 0 | | | RTI | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | | JMP | 1 | 1 | 0 | 0 | 0 | 0 | 0 | В | ase | R | 0 | 0 | 0 | 0 | 0 | 0 | | | RET | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | | | | | | | | | | | | | | | | | | | | | | TRAP | AP 1 1 1 1 0 0 0 0 TrapVector8 | | | | | | | | | | | | | | | | 2025/2/24 #### 计算机系统概论 Introduction to Computing Systems (CS1002A.02) #### 计算机科学与技术学院 School of Computer Science and Technology #### **Outline** - 1 Review - **2** LC-3 Control Instructions Overview - Conditional Branch Instruction and Loop Control Example - 4 Jump & TRAP Instruction - 5 Summary #### **Outline** - 1 Review - **2** LC-3 Control Instructions Overview - Conditional Branch Instruction and Loop Control Example - 4 Jump & TRAP Instruction - 5 Summary ## **Great Idea #3: Abstraction Helps Us Manage Complexity** #### **LC-3 ISA Overview** | | 移动数据指令 | | | | | | | | | | | | | | |------|---------------------------------------|-----------------------------------|----|----|---|----------|-----------|-------------|--|--|--|--|--|--| | 取数指令 | | | | | | | | | | | | | | | | | 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 | | | | | | | | | | | | | | | | LD | 0 0 1 0 DR PCoffset9 | | | | | | | | | | | | | | | LDR | 0 | 1 | 1 | 0 | DR | BaseR | PCoffset6 | | | | | | | | | LDI | 1 | 0 | | 0 | Coffset9 | | | | | | | | | | | LEA | 1 | 1 | 1 | 0 | DR | PCoffset9 | | | | | | | | | | 存数指令 | | | | | | | | | | | | | | | | | 15 | 14 | 13 | | | 8 7 6 | 5 4 3 2 1 0 | | | | | | | | | ST | 15 14 13 12 11 10 9<br>0 0 1 1 SR | | | | | PCoffset9 | | | | | | | | | Ī | STR | 0 | 1 | 1 | 1 | SR | BaseR | PCoffset6 | | | | | | | | Ī | STI | STI 1 0 1 1 SR PCoffset9 | | | | | | PCoffset9 | | | | | | | #### LC-3 Data Path After Load/Store Instruction #### **Today** - We are going to learn how to: - Used to alter the sequence of instructions (by changing the Program Counter) - 1 Review - **2** LC-3 Control Instructions Overview - Conditional Branch Instruction and Loop Control Example - 4 Jump & TRAP Instruction - 5 Summary #### **Control Instructions** #### **Conditional Branch** - branch is taken if a specified condition is true - signed offset is added to PC to yield new PC - else, the branch is not taken - PC is not changed, points to the next sequential instruction #### **Unconditional Branch (or Jump)** always changes the PC #### **TRAP** - changes PC to the address of an OS "service routine" - routine will return control to the next instruction (after TRAP) #### **LC-3 ISA Overview** | 控制指令 | | | | | | | | | | | | | | | | | |------|----|----|----|----|----|------------|---|----|-----|---|-----|------|-----|----|---|---| | | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | BR | 0 | 0 | 0 | 0 | n | z | р | | | F | C | offs | set | 9 | | | | JSR | 0 | | 0 | 0 | | PCoffset11 | | | | | | | | | | | | JSRR | 0 | | 0 | 0 | 0 | 0 | 0 | Ва | ase | R | 0 | 0 | 0 | 0 | 0 | 0 | | RTI | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | JMP | 1 | 1 | 0 | 0 | 0 | 0 | 0 | Ва | ase | R | 0 | 0 | 0 | 0 | 0 | 0 | | RET | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | | TRAP | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | | 1 | Гrа | pV | ect | or | 8 | | #### **Control Instructions** 2025/2/24 #### **Condition Codes** #### LC-3 has three condition code registers: N -- negative 7 -- zero P -- positive (greater than zero) Set by any instruction that writes a value to a register (ADD, AND, NOT, LD, LDR, LDI, LEA) #### Exactly one will be set at all times Based on the last instruction that altered a register #### **Outline** - 1 Review - **2** LC-3 Control Instructions Overview - Conditional Branch Instruction and Loop Control Example - 4 Jump & TRAP Instruction - 5 Summary #### **Conditional Branch Instruction** #### Branch specifies one or more condition codes. If the specified bit is set, the branch is taken. - PC-relative addressing: target address is made by adding signed offset (IR[8:0]) to current PC. - Note: PC has already been incremented by FETCH stage. - Note: Target must be within 256 words of BR instruction. If the branch is not taken, the next sequential instruction is executed. #### **BR (PC-Relative)** What happens if bits [11:9] are all zero? What happens if bits [11:9] are all one? #### BR (PC-Relative): BR<sub>z</sub> x4101 What happens if bits [11:9] are all zero? What happens if bits [11:9] are all one? # → EA → OP **BR (PC-Relative)** 16 ←□ → ←□ → ← □ → ### D EA **■** EX **BR (PC-Relative)** LD.PC ──▷ **PCMUX** MAR - LD.MAR LD.MDR — MDR 16 **MEMORY** MIO.EN-> MUX - MEM.EN.R.W ∢□▶ ∢圖▶ ∢運▶ ∢運▶ # **BR (PC-Relative)** LD.IR-16 GateMDR — ▷△ LD.MDR — MDR ←□ → ←□ → ← □ → ### **BR (PC-Relative)** LD.PC ──▷ **PCMUX** ADDR2MUX —> - ADDR1MUX [8:0] **SEXT** N Z P FINITE STATE MACHINE LD.IR-イロト イ御ト イミト イミト #### **BR (PC-Relative)** LD.PC ──▷ taken FCMUX ADDR2MUX —> - ADDR1MUX LD.CC [8:0] **SEXT** N Z P FINITE STATE MACHINE LD.IR-イロト イ御ト イミト イミト #### **BR (PC-Relative)** #### ■ Check ``` • BR<sub>nzp</sub> ; if (n=1 \text{ or } z=1 \text{ or } p=1) , JMP x4101 x4101 BR<sub>n</sub> x4101 ; if (n=1) • BR, x4101 ; if (z=1) \bullet BR<sub>n</sub> x4101 ; if (p=1) \bullet BR<sub>nz</sub> x4101 ; if (n=1 or z=1) • BR<sub>nn</sub> x4101 ; if (n=1 or p=1) • BR<sub>zp</sub> x4101 ; if (z=1 or p=1) x4101 : PC=PC+1 BR ``` #### ■ Set - If DR < 0, set N=1 and Z=0 and P=0 - If DR = 0, set N=0 and Z=1 and P=0 - If DR > 0, set N=0 and Z=0 and P=1 #### **Using Branch Instructions** #### Sample Program(The use of a counter) | Address | Instruction | Comments | |---------------|-----------------------------------------|----------------------------------| | <b>x</b> 3000 | 1 1 1 0 0 0 1 0 1 1 1 1 1 1 1 | R1 ← x3100 (PC+0xFF); LEA | | x3001 | 0 1 0 1 0 1 1 0 1 1 1 0 0 0 0 0 | R3 ← 0; AND | | x3002 | 0 1 0 1 0 1 0 0 1 0 1 0 0 0 0 0 | R2 ← 0; AND | | <b>x</b> 3003 | 0 0 0 1 0 1 0 0 1 0 1 0 1 0 0 0 0 0 0 0 | R2 ← 12; ADD | | x3004 | 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 | If Z, goto $(PC+5) = x300A$ ; BR | | <b>x</b> 3005 | 0 1 1 0 1 0 0 0 0 1 0 0 0 0 0 | Load next value to R4; LDR | | <b>x</b> 3006 | 0 0 0 1 0 1 1 0 1 1 0 0 0 1 0 0 | R3 ← R3 + R4; ADD | | <b>x</b> 3007 | 0 0 0 1 0 0 1 0 0 1 1 0 0 0 1 | Increment R1 (pointer); ADD | | x3008 | 0 0 0 1 0 1 0 1 1 1 1 1 1 | Decrement R2 (counter); ADD | | <b>x</b> 3009 | 0 0 0 0 1 1 1 1 1 1 1 1 0 1 0 | Goto (PC-6)=x3004 ;BR | 2025/2/24 #### **Using Branch Instructions** #### Compute sum of 12 integers. Numbers start at location x3100. Program starts at location x3000. #### The use of a sentinel A special character used to indicate the end of a sequence is often called a sentinel. Useful when you don't know ahead of time how many times to execute a loop. #### **Using Branch Instructions** #### Sample Program(The use of a sentinel) | Address | Instruction | Comments | | | | |---------------|-------------------------------|----------------------------------------|--|--|--| | <b>x</b> 3000 | 1 1 1 0 0 0 1 0 1 1 1 1 1 1 1 | $1 R1 \leftarrow (PC+0xFF)=x3100; LEA$ | | | | | x3001 | 0 1 0 1 0 1 1 0 1 1 1 0 0 0 0 | 0 R3 ← 0; AND | | | | | x3002 | 0 1 1 0 1 0 0 0 0 1 0 0 0 0 0 | 0 R4 ← M[R1]; LDR | | | | | x3003 | 0 0 0 0 1 0 0 0 0 0 0 0 1 0 | 0 If N, goto (PC+ 4)=X3008; BR | | | | | x3004 | 0 0 0 1 0 1 1 0 1 1 0 0 0 1 0 | 0 R3 ← R3 + R4; ADD | | | | | x3005 | 0 0 0 1 0 0 1 0 0 1 1 0 0 0 0 | <b>1</b> R1 ← R1 + 1; ADD | | | | | x3006 | 0 1 1 0 1 0 0 0 0 1 0 0 0 0 0 | 0 R4 ← M[R1]; LDR | | | | | <b>x</b> 3007 | 0 0 0 0 1 1 1 1 1 1 1 1 0 1 | 1 Goto (PC-5)= x3003; BR | | | | #### **Using Branch Instructions(Code Optimization)** #### Compute sum of 12 integers. Numbers start at location x3100. Program starts at location x3000. #### **Sample Program** | Address | Instruction | Comments | | | |---------------|-----------------------------------------------|--------------------------|--|--| | <b>x</b> 3000 | 1 1 1 0 <u>0 0 1 0 1 1 1 1 1 1 1 1</u> | R1 ← (PC+0xFF)= x3100 | | | | x3001 | 0 1 0 1 <u>0 1 1 0 1 1</u> 1 <u>0 0 0 0 0</u> | R3 ← 0 | | | | <b>x</b> 3002 | 0 1 1 0 <u>1 0 0 0 0 1</u> 0 <u>0 0 0 0</u> | R4 ← M[R1] | | | | x3003 | 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 | If N, goto (PC+ 3)=X3007 | | | | x3004 | 0 0 0 1 <u>0 1 1 0 1 1</u> 0 <u>0 0</u> 1 0 0 | R3 ← R3 + R4 | | | | x3005 | 0 0 0 1 0 0 1 0 0 1 1 0 0 0 0 1 | R1 ← R1 + 1 | | | | <b>x</b> 3006 | 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 | Goto (PC-5)= x3002 | | | #### **Outline** - 1 Review - **2** LC-3 Control Instructions Overview - Conditional Branch Instruction and Loop Control Example - 4 Jump & TRAP Instruction - 5 Summary #### JMP (Register) #### Jump is an unconditional branch -- <u>always</u> taken. - Target address is the contents of a register. - Allows any target address. # → EA JMP (Register) 16 ←□ → ←□ → ← □ → ## → EA JMP (Register) LD.PC-**PCMUX** LD.MDR — MDR < LD.MAR **MEMORY** MIO.EN-> MUX - MEM.EN.R.W 4□ > 4□ > 4 ≡ > 4 # JMP (Register) LD.IR-16 GateMDR — ▷♠ LD.MDR — MDR ←□ → ←□ → ← □ → ## D → EA → OP JMP R7(Register) FINITE STATE MACHINE LD.IR-←□ → ←□ → ← □ → ## JMP R7(Register) LD.PC -REG **PCMUX** FILE SR2 OUT SR1 OUT 3 SR1 - ADDR1MUX ←□ → ←□ → ← □ → #### **TRAP** | | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | |------|----|----|----|----|----|----|---|---|---|----|-----|-----|-----|----|---|---| | TRAP | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | | t: | raj | ove | ect | :8 | | | ## Calls a service routine, identified by 8-bit "trap vector." | vector | routine | |-------------|-------------------------------------| | <b>x</b> 23 | input a character from the keyboard | | x21 | output a character to the monitor | | <b>x</b> 25 | halt the program | #### **Example:** #### TRAP x23 - ; Directs the operating system to execute the IN system call. - ; The starting address of this system call is contained in memory location x0023. #### **TRAP** Calls a service routine, identified by 8-bit "trap vector." | vector | routine | |-------------|-------------------------------------| | <b>x</b> 23 | input a character from the keyboard | | x21 | output a character to the monitor | | <b>x</b> 25 | halt the program | When routine is done, PC is set to the instruction following TRAP. (We'll talk about how this works later.) ## $D \longrightarrow EA \longrightarrow OP \longrightarrow EX$ **TRAP** GatePC — 16 MAR <>─ LD.MAR ### $D \longrightarrow EA \longrightarrow OP \longrightarrow EX$ **TRAP** LD.PC-PCMUX MAR - LD.MAR LD.MDR — MDR 16 MEMORY MIO.EN-> MUX ✓— MEM.EN,R,W ## $D \longrightarrow EA \longrightarrow OP \longrightarrow EX$ **TRAP** LD.IR-16 GateMDR — ▷♠ LD.MDR — MDR ◆□▶ ◆圖▶ ◆團▶ ◆團▶ ## D → EA → OP → EX **TRAP** GateMARMUX ----MARMUX [7:0] SEXT 16 <>─ LD.MAR MAR ◆□▶ ◆□▶ ◆필▶ ◆필▶ ## D EA OP EX **TRAP** GatePC — PC REG FILE LD.REG ---SR2 OUT SR1 OUT #### **Outline** - 1 Review - **2** LC-3 Control Instructions Overview - Conditional Branch Instruction and Loop Control Example - 4 Jump & TRAP Instruction - 5 Summary ### LC-3 ISA | 控制指令 | (Control | Instructions) | |------|----------|---------------| |------|----------|---------------| 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 BR 0 0 0 0 n z p PCoffset9 JSR 0 1 0 0 1 PCoffset11 JSRR 0 0 0 0 0 BaseR 0 0 0 0 0 0 100000000000000000000 RTI 0 0 0 0 0 BaseR 0 0 0 0 0 0 0 .IMP RFT 0 0 0 0 lololololo TRAP 1 1 1 1 0 0 0 0 TrapVector8 #### 数据移动指令 (Data Movement Instructions) #### 取数指令(Load) | | 15 | 14 | 13 | 12 | 11 10 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | | | | | |-----|----|----|----|----|---------|-----------|-----|----|-----------|---|---|---|-----------|---|--|--|--|--|--| | LD | 0 | 0 | | 0 | DR | PCoffset9 | | | | | | | | | | | | | | | LDR | 0 | | | 0 | DR | Ва | ase | ₽R | PCoffset6 | | | | | | | | | | | | LDI | 1 | 0 | 1 | 0 | DR | PCoffset9 | | | | | | | | | | | | | | | LEA | 1 | | | 0 | DR | PCoffset9 | | | | | | | PCoffset9 | | | | | | | #### 存数指令(Store) | | 15 | 14 | 13 | 12 | 11 10 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | |-----|----|----|----|----|---------|-----------|-----------------|---|---|---|---|---|---|---| | ST | 0 | 0 | 1 | | SR | PCoffset9 | | | | | | | | | | STR | 0 | | 1 | | SR | E | BaseR PCoffset6 | | | | | | | | | STI | 1 | 0 | 1 | 1 | SR | | PCoffset9 | | | | | | | | ## LC-3 Data Path After Operate Instruction ## LC-3 Data Path After Load/Store Instruction ## LC-3 Data Path After Control Instruction #### LC-3 Data Path # 计算机系统概论 Introduction to Computing Systems (CS1002A.02) #### 计算机科学与技术学院 School of Computer Science and Technology # **Outline** - 1 Review - An Example: Counting Occurrences of a Character - 3 ISA & Data Path Revisited - 4 Summary ## **Control Instructions** 2025/2/24 ## **Condition Codes** ## LC-3 has three condition code registers: N -- negative Z -- zero P -- positive (greater than zero) # Set by any instruction that writes a value to a register (ADD, AND, NOT, LD, LDR, LDI, LEA) ## Exactly one will be set at all times • Based on the last instruction that altered a register - 1 Review - An Example: Counting Occurrences of a Character - 3 ISA & Data Path Revisited - 4 Summary # Counting the occurrences of a character in a file - Program begins at location x3000 - Read character from keyboard - ■Load each character from a "file" - File is a sequence of memory locations - Starting address of file is stored in the memory location immediately after the program - ■If file character equals input character, increment counter - **■**End of file is indicated by a special ASCII value: **EOT** (x04) - ■At the end, print the number of characters and halt (assume there will be less than 10 occurrences of the character) # Counting the occurrences of a character in a file # A special character used to indicate the end of a sequence is often called a sentinel. Useful when you don't know ahead of time how many times to execute a loop. # **Register and Memory** ## Register R0: hold the character that is being counted (typed from keyboard) R1: hold, in turn, each character that we get from the file being examined R2: keep track of the number of occurrences R3: at first, M[x3012]=x9000 R4: temp, checking R4= R1-ASCII(EOT) ### **Flow Chart** ## Counting the occurrences of a character in a file ``` .ORIG x3000 .ORIG x3000 AND R2, R2, #0 AND R2, R2, #0 LD R3, PTR LD R3, PTR TRAP x23 TRAP x23 LDR R1, R3, #0 LDR R1, R3, #0 TEST ADD R4, R1, #-4 TEST ADD R4, R1, #-4 BRZ OUTPUT BRZ OUTPUT NOT R1, R1 NOT R1, R1 ADD R1, R1, #1 ADD R1, R1, #1 ADD R1, R1, R0 ADD R1, R1, R0 BRnp GETCHAR BRnp GETCHAR ADD R2, R2, #1 ADD R2, R2, #1 GETCHAR ADD R3, R3, #1 GETCHAR ADD R3, R3, #1 LDR R1, R3, #0 LDR R1, R3, #0 BRnzp TEST BRnzp TEST OUTPUT LD RO. ASCII OUTPUT LD RO, ASCII ADD RO. RO. R2 ADD R0, R0, R2 TRAP X21 TRAP x21 HALT HAI T PTR .FILL x9000 PTR .FILL x9000 ASCII .FILL x30 .FILL x30 ASCII .END . END .ORIG X9000 .ORIG X9000 .FILL x0031 .FTII x0031 .FILL x0032 .FILL x0032 .FILL x0031 .FILL x0031 .FILL x0033 .FILL x0033 .FILL x0043 .FILL x0043 .FILL x04 .FILL x04 .END . END ``` # Program (1 of 2) | Address | Instruction | Comments | |---------------|-----------------------------------------------|-------------------------------------------------------------| | <b>x</b> 3000 | 0 1 0 1 0 1 0 1 0 1 0 1 0 0 0 0 | R2 ← 0 (counter) AND R2,R2, #0 | | x3001 | 0 0 1 0 0 1 1 0 0 0 1 0 0 0 0 | $R3 \leftarrow M[x3012] (ptr)$<br>LD R3, x3012 (LD R3, PTR) | | <b>x</b> 3002 | <b>1 1 1 1</b> 0 0 0 0 <u>0 0 1 0 0 0 1 1</u> | Input to R0 (TRAP x23) TRAP x23 (GETC) | | <b>x</b> 3003 | 0 1 1 0 0 0 1 0 1 1 0 0 0 0 0 0 | R1 ← M[R3]<br>LDR R1, R3, #0 | | x3004 | 0 0 0 1 1 0 0 0 0 1 1 1 1 1 0 0 | R4 ← R1 − 4 (EOT)<br>ADD R4,R1, #-4 | | <b>x</b> 3005 | 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 | If Z, goto x300E<br>BRz x300E (BRz OUTPUT) | | <b>x</b> 3006 | 1 0 0 1 <u>0 0 1 0 0 1</u> 1 1 1 1 1 1 | $R1 \leftarrow NOT R1$ $NOT R1,R1$ | | <b>x</b> 3007 | 0 0 0 1 0 0 1 0 0 1 1 0 0 0 1 | R1 ← R1 + 1<br>ADD R1,R1,#1 | | x3008 | 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 | R1 ← R1 + R0<br>ADD R1,R1,R0 | | <b>x</b> 3009 | 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 | If N or P, goto x300B<br>BRnp x300B (BRnp GETCHAR) | 2025/2/24 # Program (2 of 2) | Address | Instruction | Comments | |---------------|-----------------------------------------------|---------------------------------------------------------| | *300A | 0 0 0 1 0 1 0 1 0 1 0 1 0 0 0 1 | R2 ← R2 + 1<br>ADD R2,R2,#1 | | ж300B | 0 0 0 1 0 1 1 0 1 1 1 0 0 0 0 1 | R3 ← R3 + 1<br>ADD R3,R3,#1 | | x300C | 0 1 1 0 0 0 1 0 1 1 0 0 0 0 0 0 | R1 ← M[R3]<br>LDR R1,R3,#0 | | <b>x</b> 300D | 0 0 0 0 1 1 1 1 1 1 1 1 0 1 1 0 | Goto x3004<br>BRnzp x3004 (BRnzp TEST) | | ж300E | 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 | $R0 \leftarrow M[x3013]$ $LD R0, x3013 (LD R0, ASCII)$ | | <b>x</b> 300F | 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 | $R0 \leftarrow R0 + R2$ $ADD R0, R0, R2$ | | <b>x</b> 3010 | <b>1 1 1 1</b> 0 0 0 0 <u>0 0 1 0 0 0 0 1</u> | Print R0 TRAP x21 (OUT) | | x3011 | <b>1 1 1 1</b> 0 0 0 0 <u>0 0 1 0 0 1 0 1</u> | HALT<br>TRAP x25 (HALT) | | X3012 | 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 | Starting Address of File (X9000) | | <b>x</b> 3013 | 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 | ASCII x30 ('0') | 2025/2/24 # **Outline** - 1 Review - An Example: Counting Occurrences of a Character - 3 ISA & Data Path Revisited - 4 Summary ## **Great Idea #4: Software and Hardware Co-design** ## How do we get the electrons to do the work? ## Instruction Set Architecture (ISA) - **■** Computer's native operations called instructions. - Job of a CPU (Central Processing Unit, aka Core): execute instructions - Instructions: CPU's primitives operations - Instructions performed one after another in sequence - Each instruction does a small amount of work (a tiny part of a larger program). - Each instruction has an operation applied to operands, and might be used change the sequence of instruction. - Instruction set architecture (ISA) specifies the set of commands (instructions) a computer can execute - Hardware registers provide a few very fast variables for instructions to operate on ## Instruction Set Architecture (ISA) - The instruction set defines all the valid instructions. - CPUs belong to "families," each implementing its own set of instructions - CPU's particular set of instructions implements an **Instruction Set Architecture (ISA)** - **■** Examples: - ARM, - Intel x86 - MIPS - RISC-V - IBM/Motorola PowerPC (old Mac) - Intel IA64, #### Instruction set architecture evolution ### LC-3 ISA #### 控制指令(Control Instructions) 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 BR 0 0 **0 0** n z p PCoffset9 JSR 0 1 0 0 1 PCoffset11 JSRR 0 0 0 0 BaseR 0 0 0 0 0 0 10000000000000000000 RTI 0 0 0 0 0 BaseR 0 0 0 0 0 0 0 .IMP RFT 0 0 0 0 0 00000000 TRAP 1 1 1 1 0 0 0 0 TrapVector8 #### 数据移动指令 (Data Movement Instructions) #### 取数指令(Load) | | 15 | 14 | 13 | 12 | 11 10 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |-----|----|----|----|----|---------|-----------|-----|----|-----------|---|---|---|---|---|--| | LD | 0 | 0 | | 0 | DR | PCoffset9 | | | | | | | | | | | LDR | 0 | | | 0 | DR | Ва | ase | ₽R | PCoffset6 | | | | | | | | LDI | 1 | 0 | 1 | 0 | DR | PCoffset9 | | | | | | | | | | | LEA | 1 | | | 0 | DR | PCoffset9 | | | | | | | | | | #### 存数指令(Store) | | 15 | 14 | 13 | 12 | 11 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | |-----|----|----|----|----|-------|---|-----------------|---|---|---|---|---|---|---|---| | ST | 0 | | 1 | | SR | | PCoffset9 | | | | | | | | | | STR | 0 | | 1 | | SR | | BaseR PCoffset6 | | | | | | | | | | STI | 1 | | 1 | 1 | SR | | PCoffset9 | | | | | | | | | # **Instruction Processing (state transtion)** EA EX # Instruction Set Architecture (ISA) vs. Finite State Automata [BEN] PC«-PC+off9 PC-o-BaseR DREITH PCc-RaseR To 18 B+off6 : Base + SEXT[offset6] PC+off9 : PC + SEXT[offset6] PC+off11: PC + SEXT[offset11] \*OP2 may be SR2 or SEXTfirms! PC«-PC+off1 #### LC-3 Data Path #### LC-3 Data Path #### Global bus - special set of wires that carry a 16-bit signal to many components - inputs to the bus are "tri-state devices," that only place a signal on the bus when they are enabled - only one (16-bit) signal should be enabled at any time - control unit decides which signal "drives" the bus - any number of components can read the bus - register only captures bus data if it is write-enabled by the control unit ### Memory - Control and data registers for memory and I/O devices - memory: MAR, MDR (also control signal for read/write) #### **ALU** - Accepts inputs from register file and from sign-extended bits from IR (immediate field). - Output goes to bus. - used by condition code logic, register file, memory ## **Register File** - Two read addresses (SR1, SR2), one write address (DR) - Input from bus - result of ALU operation or memory read - Two 16-bit outputs - used by ALU, PC, memory address - data for store instructions passes through ALU #### PC and PCMUX - Three inputs to PC, controlled by PCMUX - 1. PC+1 FETCH stage - 2. Address adder BR, JMP - 3. bus TRAP (discussed later) #### MAR and MARMUX - Two inputs to MAR, controlled by MARMUX - 1. Address adder LD/ST, LDR/STR - 2. Zero-extended IR[7:0] -- TRAP (discussed later) ## **Condition Code Logic** - Looks at value on bus and generates N, Z, P signals - Registers set only when control unit enables them (LD.CC) - only certain instructions set the codes (ADD, AND, NOT, LD, LDI, LDR, LEA) #### **Control Unit – Finite State Machine** - On each machine cycle, changes control signals for next phase of instruction processing - who drives the bus? (GatePC, GateALU, ···) - which registers are write enabled? (LD.IR, LD.REG, ···) - which operation should ALU perform? (ALUK) - \_ ... - Logic includes decoder for opcode, etc. ### **LC-3 Data Path** Filled arrow = info to be processed. Unfilled arrow = control signal. #### Gate MARMUX -- ^ ^-GatePC LC-3 Data Path LD.PC-> - MARMUX +1 REG FILE PCMUX D.REG--□ SR2 / OUT SR1 3 SR ZEXT **Control Unit** Processing ADDR1MUX ADD R2MUX Unit [10:0] ◆ / SEXT ✓► SEXT [8:0] SEXT Filled arrow FINITE [5.0] ◆ ✓ ► SEXT STATE = info to be processed. CONTROL **Unfilled arrow** N Z P ← LD.CC = control signal. LD.IR → LOGIC √ GateALL /16 LD.MDR → MDR MAR <- LD.MAR Memory INPUT OUTPUT MEM.EN. R.W. 200 # **Outline** - 1 Review - An Example: Counting Occurrences of a Character - 3 ISA & Data Path Revisited - 4 Summary # Definition of computer architecture: classical definition ... the attributes of a [computing] system as seen by the programmer, *i.e.* the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls the logic design, and the physical implementation. - Amdahl, Blaaw, and Brooks, 1964 ### **Instruction Set Architecture: What does each cycle do?** Stage1:从存储系统中获得指令 Stage2:确定做何动作 Stage3:获得操作数 Stage4:产生运算结果或状态 Stage5:向存储系统中存放运算结果 Stage6:确定下一条要执行的指令 # **Roadmap To Future Classes...** #### **■** Compilers All the processes in going from source code to assembly #### OS - OS often needs a small amount of assembly for doing things the "high level" language doesn't support - Such as accessing special resources #### **■** Computer Architecture How to build the computer that supports the assembly #### ■ Computer Security Exploit code ("shell code") is often in assembly and exploitation often requires understanding the assembly language of the target. # 计算机系统概论 Introduction to Computing Systems (CS1002A.02) # Chapter 7 Assembly Language Program 计算机科学与技术学院 School of Computer Science and Technology # **Outline** - 1 Review - 2 Assembly Language Overview - 3 Assembly Process - 4 Summary # **Outline** - 1 Review - 2 Assembly Language Overview - 3 Assembly Process - 4 Summary # **Review:** The Transistor & Basic Logical Structure #### Review: Von Neumann Model #### Review: Von Neumann Model #### ■ So far, we' ve learned how to: - compute with values in registers - load data from memory to registers - store data from registers to memory #### Review: The ISA | | 15 14 13 | 12 11 | 10 9 | 8 7 6 | 5 | 4 3 | 2 1 0 | | |------------------|----------|-------|------|---------------|-----------|-----------|----------------------|--| | ADD <sup>†</sup> | 0001 | L | DR | SR1 | 0 | 00 | SR2 | | | ADD <sup>+</sup> | 0001 | - 1 | DR | SR1 | 1 | | mm5 | | | ND <sup>+</sup> | 0101 | 8 | DR | SR1 | 0 | 00 | SR2 | | | ND <sup>+</sup> | 0101 | 1 8 | DR | SR1 | 1 | 1 1 1 1 1 | | | | BR | 0000 | n | z p | | PCoffset9 | | | | | MP | 1100 | 8 | 000 | BaseR | | 000 | 000 | | | ISR | 0100 | 1 | 1 | PC | offse | t11 | 6 <del>11 11</del> | | | ISRR | 0100 | 0 | 00 | BaseR | | 000 | 000 | | | .D <sup>+</sup> | 0010 | | DR | | PCoffset9 | | | | | .DI <sup>+</sup> | 1010 | DR | | ı PCoffset9 | | | | | | DR <sup>+</sup> | 0110 | 8 | DR | BaseR offset6 | | | | | | EA <sup>+</sup> | 1110 | | DR | | PC | offset9 | | | | ют+ [ | 1001 | T | DR | SR | SR | | 111111 | | | RET | 1100 | 8 | 000 | 111 | | 000000 | | | | TI | 1000 | 1 4 | | 00000 | 0000 | | | | | т | 0011 | Ħ | SR | | PCoffset9 | | | | | TI | 1011 | SR | | | | offset9 | - | | | TR | 0111 | 3 | SR | BaseR | 3 | offs | et6 | | | RAP | 1111 | 1 4 | 0000 | | | rapvect8 | | | | served | 1101 | t | + | - | - | - | - | | Figure A.2 Format of the entire LC-3 instruction set. Note: + indicates instructions that modify condition codes # Review: The State Machine(Turing Machine equivalent) ## Review: The Data Path(von Neumann Model) # **Outline** - 1 Review - 2 Assembly Language Overview - 3 Assembly Process - 4 Summary ## **A LC-3 Program** ``` X4101 0 1 0 1 0 1 0 0 1 0 1 0 0 0 0 X4102 0 0 1 0 0 1 1 0 0 0 0 1 0 0 0 0 X4103 1 1 1 1 0 0 0 0 0 0 1 0 0 0 1 1 X4104 X4105 0 0 0 1 1 0 0 0 0 1 1 1 1 1 0 0 X4106 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 X4107 1001001001111111 X4108 0 0 0 1 0 0 1 0 0 1 1 0 0 0 0 1 X4109 X410A X410B X410C 0 0 0 1 0 1 1 0 1 1 1 0 0 0 0 1 X410D 0 1 1 0 0 0 1 0 1 1 0 0 0 0 0 X410E X410F X4110 X4101 1111000000100001 X4102 1 1 1 1 0 0 0 0 0 0 1 0 0 1 0 1 X4103 X4104 ``` | X8001 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | |-------|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---| | X8002 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | | X8003 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | | X8004 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | | X8005 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | | X8006 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | | X8007 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | | X8008 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | | X8009 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | | X800A | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | | X800B | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | | X800C | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | | X800D | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | | X800E | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | | X800F | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | | X8010 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | | X8011 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | | X8012 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | | X8013 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | | X8014 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | | X8015 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | | X8016 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | | X8017 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | | X8018 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | | | | | | | | | | | | | | | | | | | # **Human-Readable Machine Language** **■** Computers like ones and zeros... 0001110010000110 ■ Humans like symbols... ``` ADD R6, R2, R6; increment index reg. or C = a + b; ``` - Assembler is a program that turns symbols into machine instructions. - ISA-specific: close correspondence between symbols and instruction set - mnemonics for opcodes - labels for memory locations - additional operations for allocating storage and initializing data # **Great Idea #4: Software and Hardware Codesign** # **Great Idea #3: Abstraction Helps Us Manage Complexity** # **An Assembly Language Program** ``` Program to multiply a number by the constant 6 .ORIG x3050 LD R1, SIX LD R2, NUMBER AND R3, R3, #0 ; Clear R3. It will ; contain the product. ; The inner loop AGAIN ADD R3, R3, R2 ADD R1, R1, #-1; R1 keeps track of BRp AGAIN ; the iteration. HALT NUMBER .BLKW 1 SIX .FILL x0006 . END ``` # LC-3 Assembly Language Syntax - **■** Each line of a program is one of the following: - an instruction - an assembler directive (or pseudo-op) - a comment - Whitespace (between symbols) and case are ignored. - Comments (beginning with ";") are also ignored. - An instruction has the following format: # **Opcodes and Operands** #### Opcodes - reserved symbols that correspond to LC-3 instructions - listed in Appendix A ex: ADD, AND, LD, LDR, ... #### Operands - registers -- specified by Rn, where n is the register number - numbers -- indicated by # (decimal) or x (hex) - label -- symbolic name of memory location - separated by comma - number, order, and type correspond to instruction format ``` -ex: ADD R1,R1,R3 ADD R1,R1,#3 LD R6,NUMBER 2025/2/24 BRz LOOP ``` #### **Labels and Comments** #### ■ Label - placed at the beginning of the line - assigns a symbolic name to the address corresponding to line ``` -ex: ``` ``` LOOP ADD R1,R1,#-1 BRp LOOP ``` #### Comment - anything after a semicolon is a comment - ignored by assembler - used by humans to document/understand programs - tips for useful comments: - avoid restating the obvious, as "decrement R1" - provide additional insight, as in "accumulate product in R6" - use comments to separate pieces of program #### **Assembler Directives** #### **■** Pseudo-operations - do not refer to operations executed by program - used by assembler - look like instruction, but "opcode" starts with dot | Opcode | Operand | Meaning | |----------|-----------------------|---------------------------------------------------------------------------| | .ORIG | address | starting address of program | | .END | | end of program | | .BLKW | n | allocate n words of storage | | .FILL | n | allocate one word, initialize with value n | | .STRINGZ | n-character<br>string | allocate n+1 locations,<br>initialize w/characters and null<br>terminator | #### **Example** ``` .ORIG X3010 HELLO .STRINGZ " Hello, World! " ``` ``` x3010: x0048 x3011: x0065 x3012: x006C x3013: x006C x3014: x006F x3015: x002C x3016: x0020 x3017: x0057 x3018: x006F x0072 x3019: x301A: x006C x301B: x0064 x301C: x0021 x301D: x0000 ``` # **Trap Codes** ■ LC-3 assembler provides "pseudo-instructions" for each trap code, so you don't have to remember them. | Code | Equivalent | Description | |------|------------|-------------------------------------------------------------------------------------------------| | HALT | TRAP x25 | Halt execution and print message to console. | | IN | TRAP x23 | Print prompt on console, read (and echo) one character from keybd. Character stored in R0[7:0]. | | OUT | TRAP x21 | Write one character (in R0[7:0]) to console. | | GETC | TRAP x20 | Read one character from keyboard.<br>Character stored in R0[7:0]. | | PUTS | TRAP x22 | Write null-terminated string to console. Address of string is in R0. | # Style Guidelines #### ■ Use the following style guidelines to improve the readability and understandability of your programs: - 1. Provide a program header, with author's name, date, etc., and purpose of program. - 2.Start labels, opcode, operands, and comments in same column for each line. (Unless entire line is a comment.) - 3.Use comments to explain what each register does. - 4. Give explanatory comment for most instructions. - 5. Use meaningful symbolic names. - · Mixed upper and lower case for readability. - ASCIItoBinary, InputRoutine, SaveR1 - 6. Provide comments between program sections. - 7. Each line must fit on the page -- no wraparound or truncations. - Long statements split in aesthetically pleasing manner. # **Sample Program** ■ Count the occurrences of a character in a file. Remember this? ## Program (1 of 2) | Address | Instruction | Comments | |---------------|----------------------------------------|----------------------------------------------------| | <b>x</b> 3000 | 0 1 0 1 0 1 0 1 0 1 0 1 0 0 0 0 0 | R2 ← 0 (counter)<br>AND R2,R2, #0 | | x3001 | 0 0 1 0 0 1 1 0 0 0 0 1 0 0 0 0 | R3 ← M[x3012] (ptr)<br>LD R3, x3012 (LD R3, PTR) | | <b>x</b> 3002 | 1 1 1 1 0 0 0 0 <u>0 0 1 0 0 0 1 1</u> | Input to R0 (TRAP x23) TRAP x23 (GETC) | | <b>x</b> 3003 | 0 1 1 0 0 0 1 0 1 1 0 0 0 0 0 0 | R1 ← M[R3]<br>LDR R1, R3, #0 | | x3004 | 0 0 0 1 1 0 0 0 0 1 1 1 1 1 0 0 | R4 ← R1 − 4 (EOT)<br>ADD R4,R1, #-4 | | <b>x</b> 3005 | 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 | If Z, goto x300E<br>BRz x300E (BRz OUTPUT) | | <b>x</b> 3006 | 1 0 0 1 <u>0 0 1 0 0 1</u> 1 1 1 1 1 1 | $R1 \leftarrow NOT R1$ $NOT R1,R1$ | | <b>x</b> 3007 | 0 0 0 1 0 0 1 0 0 1 1 0 0 0 1 | R1 ← R1 + 1<br>ADD R1,R1,#1 | | x3008 | 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 | $R1 \leftarrow R1 + R0$ $ADD R1,R1,R0$ | | <b>x</b> 3009 | 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 | If N or P, goto x300B<br>BRnp x300B (BRnp GETCHAR) | 2025/2/24 # Program (2 of 2) | Address | Instruction | Comments | |---------------|-----------------------------------------------|---------------------------------------------------------| | <b>x</b> 300A | 0 0 0 1 0 1 0 1 0 1 0 1 0 0 1 | R2 ← R2 + 1<br>ADD R2,R2,#1 | | ж300B | 0 0 0 1 0 1 1 0 1 1 1 0 0 0 0 1 | R3 ← R3 + 1<br>ADD R3,R3,#1 | | x300C | 0 1 1 0 0 0 1 0 1 1 0 0 0 0 0 0 | R1 ← M[R3]<br>LDR R1,R3,#0 | | <b>x</b> 300D | 0 0 0 0 1 1 1 1 1 1 1 1 0 1 1 0 | Goto x3004<br>BRnzp x3004 (BRnzp TEST) | | *300E | 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 | $R0 \leftarrow M[x3013]$ $LD R0, x3013 (LD R0, ASCII)$ | | x300F | 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 | $R0 \leftarrow R0 + R2$ $ADD R0, R0, R2$ | | <b>x</b> 3010 | <b>1 1 1 1</b> 0 0 0 0 <u>0 0 1 0 0 0 0 1</u> | Print R0 TRAP x21 (OUT) | | <b>x</b> 3011 | <b>1 1 1 1</b> 0 0 0 0 <u>0 0 1 0 0 1 0 1</u> | HALT<br>TRAP x25 (HALT) | | X3012 | 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 | Starting Address of File (X9000) | | <b>x</b> 3013 | 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 | ASCII x30 ('0') | 2025/2/24 # **Char Count in Assembly Language (1 of 3)** ``` 01 02 ; Program to count occurrences of a character in a file. 03; Character to be input from the keyboard. 04 : Result to be displayed on the monitor. 05 : Program only works if no more than 9 occurrences are 06: found. 07 : 08 : Initialization 09; 0A .ORIG x3000 0B AND R2, R2, #0 ; R2 is counter, initially 0 LD 0C R3, PTR ; R3 is pointer to characters OD : TRAP x23 GETC 0E ; R0 gets character input 0F R1, R3, #0 ; R1 gets first character LDR 10: 11 Test character for end of file 12 13 TEST ADD R4, R1, #-4 : Test for EOT 14 ; (ASCII x04) 15 BRz OUTPUT ; If done, prepare the output ``` # **Char Count in Assembly Language (2 of 3)** ``` 16 17 : Test character for match. If a match, increment count. 18 ; 19 NOT R1, R1 1A ADD R1, R1, R0 ; If match, R1 = xFFFF 1B NOT R1, R1; If match, R1 = \times 0000 1C BRnp GETCHAR ; If no match, do not increment 1D R2, R2, #1 ADD 1E : 1F : Get next character from file. 20 GETCHAR ADD R3, R3, #1; Point to next character. 22 R1, R3, #0; R1 gets next char to test LDR 23 BRnzp TEST 24 ; 25 Output the count. 26 27 OUTPUT LD R0, ASCII ; Load the ASCII template 28 ADD R0, R0, R2; Covert binary count to ASCII 29 : TRAP x21 OUT 2A ; ASCII code in R0 is displayed. 2B ; TRAP x25, Halt machine HALT ``` # **Char Count in Assembly Language (3 of 3)** # **Outline** - 1 Review - 2 Assembly Language Overview - 3 Assembly Process - 4 Summary # **Assembly Process** ■ Convert assembly language file (.asm) into an executable file (.obj) for the LC-3 simulator. #### First Pass: - scan program file - find all labels and calculate the corresponding addresses; this is called the <u>symbol table</u> #### ■ Second Pass: convert instructions to machine language, using information from symbol table # **First Pass: Constructing the Symbol Table** - 1. Find the .ORIG statement, which tells us the address of the first instruction. - Initialize location counter (LC), which keeps track of the current instruction. - 2. For each non-empty line in the program: - a) If line contains a label, add label and LC to symbol table. - b) Increment LC. - NOTE: If statement is .BLKW or .STRINGZ, increment LC by the number of words allocated. - 3. Stop when .END statement is reached. Right? NOTE: A line that contains only a comment is considered an empty line. #### **Practice** ■ Construct the symbol table for the program in Figure 7.2 | Symbol | Address | |--------|---------| | | | | | | | | | | | | | | | #### **Practice** ■ Construct the symbol table for the program in Figure 7.2 | Symbol | Address | |---------|---------| | TEST | X3004 | | GETCHAR | X300B | | ОИТРИТ | X300E | | ASCII | X3012 | | PTR | X3013 | ### **Second Pass: Generating Machine Language** - For each executable assembly language statement, generate the corresponding machine language instruction. - If operand is a label, look up the address from the symbol table. ### **■** Potential problems: • Improper number or type of arguments ``` - ex: NOT R1,#7 ADD R1,R2 ``` ADD R3,R3,NUMBER - Immediate argument too large - ex: ADD R1,R2,#1023 - Address (associated with label) not on the same page - can't use direct addressing mode ### **Practice** - Using the symbol table constructed earlier, translate these statements into LC-3 machine language. - (Assume all addresses are on the current page.) | Statement | | Machine Language | |-----------|-----------|------------------| | ΓD | R3,PTR | | | ADD | R4,R1,#-4 | | | LDR | R1,R3,#0 | | | BRnp | GETCHAR | | ### LC-3 Assembler ■ Using "assemble" (Unix) or LC3 Edit (Windows), generates several different output files. 2025/2/24 ### **Object File Format** ### **■** LC-3 object file contains - Starting address (location where program must be loaded), followed by... - Machine instructions ### Example Beginning of "count character" object file looks like this: ### **Multiple Object Files** - An object file is not necessarily a complete program. - system-provided library routines - code blocks written by multiple developers - For LC-3, can load multiple object files into memory, then start executing at a desired address. - system routines, such as keyboard input, are loaded automatically - loaded into "system memory," below x1000 - by convention, user code should be loaded between x3000 and xCFFF - each object file includes a starting address - be careful not to load overlapping object files ### **Linking and Loading** - Loading is the process of copying an executable image into memory. - more sophisticated loaders are able to <u>relocate</u> images to fit into available memory - must readjust branch targets, load/store addresses - *Linking* is the process of resolving symbols between independent object files. - suppose we define a symbol in one module, and want to use it in another - some notation, such as .EXTERNAL, is used to tell assembler that a symbol is defined in another module - linker will search symbol tables of other modules to resolve symbols and complete code generation before loading ### **Outline** - 1 Review - 2 Assembly Language Overview - 3 Assembly Process - 4 Summary ### **Summary: Assembly Language** ### Memory map of the LC-3 ### 计算系统概论A Introduction to Computing Systems ( CS1002A.02 ) ### 计算机科学与技术学院 School of Computer Science and Technology ### **Outline** - 1 Review - 2 Subroutines - 3 Control Instructions for Subroutines - 4 Summary ### **Abstract Data Types: Data Structures** - Up to now, we have processed a single value - an integer - an ASCII character - The information in the real world is far more complex than simple, single numbers. We call these complex items of information abstract data types, or more colloquially data structures, E.g. - a company's organization chart - a list of items arranged in alphabetical order - In this chapter, we will study three abstract data types: - stacks - queues - and character strings ### **Abstract Data Types: Data Structures** - We will write programs to solve problems that require expressing information according to its structure. - Before we get to stacks, queues, and character strings, however, we introduce a new concept that will prove very useful in manipulating data structures: subroutines, or what is also called functions. ### **Subroutines** ### ■ A subroutine is a program fragment that. . . - Resides in user space (i.e, not in OS) - Performs a well-defined task - Is invoked (called) multiple times by a user program - Returns control to the calling program when finished #### ■ Virtues - Reuse code without re-typing it (and debugging it!) - Divide task into parts (or among multiple programmers) - Use vendor-supplied library of useful routines that one software engineer writes a program that requires such fragments and another software engineer writes the fragments. - -math library - -square root, sine, and arctangent, etc. ### A simple illustration of a part of a program ``` 01 START ST R1, SaveR1 Save registers needed 02 ST R2, SaveR2 ; by this routine ST R3.SaveR3 03 05 LD R2.Newline 06 L1 LDI R3,DSR 07 BRzp L1 Loop until monitor is ready 08 STI R2,DDR Move cursor to new clean line 09; LEA R1, Prompt OΑ ; Starting address of prompt string LDR R0,R1,#0 Write the input prompt 0B Loop OC: BRz Input End of prompt string 0D T.2 LDI R3,DSR OF. ; Loop until monitor is ready BRzp L2 чo STI RO,DDR ; Write next prompt character 10 ADD R1,R1,#1 ; Increment prompt pointer 11 BRnzp Loop ; Get next prompt character 12 : 13 Input LDI R3, KBSR 14 BRzp Input ; Poll until a character is typed 15 TDT RO KBDR Load input character into RO 16 T.3 LDI R3,DSR 17 BRzp L3 Loop until monitor is ready 18 STI RO,DDR Echo input character 19 : LDI R3,DSR 1A L4 1B BRzp L4 ; Loop until monitor is ready 1C STI R2,DDR Move cursor to new clean line 1D LD R1, Saveki Restore registers 1E LD R2, SaveR2 ; to original values 1F LD R3, SaveR3 20 JMP R7; Do the program's next task 2025/2/24 ``` ### A simple illustration of a part of a program ``` 21 ; 22 SaveR1 .BLKW 1 ; Memory for registers saved 23 SaveR2 .BLKW 1 24 SaveR3 .BLKW 1 25 DSR .FILL xFE04 26 DDR .FILL xFE06 27 KBSR .FILL xFE00 28 KBDR .FILL xFE02 29 Newline .FILL x000A ; ASCII code for newline 2A Prompt .STRINGZ '\Input a character>'' ``` ### The Call/Return Mechanism (a) Without subroutines (b) With subroutines ### **Control Instructions for Subroutines** | | | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | |--|------|----|----------------------|----|----|----|----|---|-----------|---------------|---|---|---|---|---|---|---| | | BR | 0 | 0 | 0 | 0 | n | Z | р | PCoffset9 | | | | | | | | | | | JSR | 0 | 0 1 0 0 1 PCoffset11 | | | | | | | | | | | | | | | | | JSRR | 0 | 1 | 0 | 0 | 0 | 0 | 0 | В | BaseR | | 0 | 0 | 0 | 0 | 0 | 0 | | | RTI | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | | JMP | 1 | 1 | 0 | 0 | 0 | 0 | 0 | В | BaseR | | 0 | 0 | 0 | 0 | 0 | 0 | | | RET | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | | | | | | | | | | | | | | | | | | | | | | TRAP | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 TrapVector8 | | | | | | | | 2025/2/24 | D > 4 A > 4 B > 4 B > B > A ### **JSR** # JSR (PC-Relative) 16 ←□ → ←□ → ← □ → ### **JSR (PC-Relative)** LD.PC ──▷ **PCMUX** MAR - LD.MAR LD.MDR — MDR 16 **MEMORY** MIO.EN-> MUX - MEM.EN.R.W ◆□▶ ◆圖▶ ◆臺▶ ◆臺▶ ## JSR (PC-Relative) LD.IR-16 GateMDR — ▷△ LD.MDR — MDR ←□ → ←□ → ← □ → ## **JSR (PC-Relative)** REG FILE LD.REG -SR2 OUT SR1 OUT ←□ → ←□ → ← □ → ### **JSR (PC-Relative)** LD.PC ──▷ **PCMUX** ADDR2MUX ---> - ADDR1MUX [10:0] SEXT LD.IR-イロト イ御ト イミト イミト ### **JSRR** 16 00 ## → EA **JSRR** (Register) 16 ←□ → ←□ → ← □ → ### **JSRR** (Register) LD.PC-**PCMUX** LD.MDR — MDR < LD.MAR **MEMORY** MIO.EN-> MUX - MEM.EN.R.W 4□ > 4□ > 4 ≡ > 4 ## **JSRR** (Register) LD.IR-716 GateMDR — ▷△ LD.MDR — MDR ←□ → ←□ → ← □ → ## **JSRR** (Register) REG FILE LD.REG -SR2 SR1 OUT ←□ → ←□ → ← □ → ### **JSRR** (Register) LD.PC-REG PCMUX FILE SR2 OUT SR1 OUT 3 SR1 - ADDR1MUX イロト イ御ト イミト イミト ### **RET** instruction ### ■ RET – return instruction - How to return - Place address in R7 in PC, Return the execution to the last calling point. - PC ← (R7) ``` TET (JMP R7) 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 ``` ### **Example: Negate the value in R0** ``` TwosComp NOT R0,R0 ;flip bits ADD R0,R0,#1 ;add one RET ;return to caller ``` ### To call from a program ``` ;need to compute R4 = R1-R3 ADD R0,R3,#0 ;copy R3 to R0 JSR TwosComp ;negate ADD R4,R1,R0 ;add to R1 ... ``` 2025/2/24 ### **Using Subroutines** ### ■ Programmer must know - Address: or at least a label that will be bound to its address - Function: what it does - NOTE: The programmer does not need to know how the subroutine works, but what changes are visible in the machine's state after the routine has run - Arguments: what they are and where they are placed - Return values: what they are and where they are placed ### Passing Information To Subroutines ### ■ Argument(s) - Value passed in to a subroutine is called an argument - This is a value needed by the subroutine to do its dor - Examples - TwosComp: R0 is number to be negated - OUT: R0 is character to be printed - PUTS: R0 is address of string to be printed #### ■ How? - In registers (simple, fast, but limited number) - In memory (many, but awkward, expensive) - Both ### **Getting Values From Subroutines** ### **■** Return Values - A value passed out of a subroutine is called a return value - This is the value that you called the subroutine to compute - Examples - TwosComp: negated value is returned in R0 - GETC: character read from the keyboard is returned in R0 #### ■ How? - Registers, memory, or both - Single return value in register most common ### Saving and Restore Registers - Like service routines, must save and restore registers - Who saves what is part of the calling convention - Generally use "callee-save" strategy, except for return values - Same as trap service routines - Save anything that subroutine alters internally that shouldn't be visible when the subroutine returns - Restore incoming arguments to original values (unless overwritten by return value) #### ■ Remember - You MUST save R7 if you call any other subroutine or trap - Otherwise, you won't be able to return! ## **Subroutine Template** ``` 01 SUB NAME 02 ;Register Saving 03 ST RO, SUB RO 04 ST R1, SUB R1 05 06 ST R6, SUB R6 07 ST R7, SUB R7; Return address 08 09 ;***Code*** 10 11 ;Register Restoring 12 LD RO, SUB RO 13 LD R1, SUB R1 14 15 LD R6, SUB R6 16 LD R7, SUB R7 :Return address 17 RET ``` 2025/2/24 # 计算机系统概论A Introduction to Computing Systems ( CS1002A.02 ) Chapter 8-2 Memory Model for Program Execution & the Stack # 计算机科学与技术学院 School of Computer Science and Technology # **Outline** - 1 Review - 2 Memory Model for Program Execution - 3 The Stack - 4 Implementing Functions in C Using a Stack # **Outline** - 1 Review - 2 Memory Model for Program Execution - 3 The Stack - 4 Implementing Functions in C Using a Stack ## **Review: The Call/Return Mechanism** (a) Without subroutines (b) With subroutines ### **Review: Control Instructions for Subroutines** 2025/2/24 ### **Review: Memory in Von Neumann Model** # **Review: Using Memory** | ■ Memory | | Memory | |-----------------------------------------------------------------------------------|---------|--------| | ● Just a big "array" | Address | Value | | "Indexed" by address | x0000 | x00A0 | | ullet Accessed with loads and stores | x0001 | x5007 | | ■ LD/LDR/LDI | x0002 | x0201 | | <ul><li>Read a word out of memory</li><li>Use different addressing mode</li></ul> | x0003 | x0203 | | ■ ST/STR/STI | x0004 | x3002 | | 1/S1R/S11 | ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, | | |-------------------------------|-----------------------------------------|-------| | Place a word in memory | | | | Use different addressing mode | xFFFC | x5007 | | ose different addressing mode | | | | | xFFFD | x0201 | | | xFFFE | x0203 | | | xFFFF | x3002 | | | | | ## **Review: Using Memory** #### ■ Problem - What if the memory you want to access is far away? - LD/ST won't work (PC-relative) - LDR/STR won't work alone (need to get address in register) #### ■ Solution:LDI/STI - Place address of far away value nearby - Load address, then load/store from that ### **Today: Memory Model for Function Calls** # **Outline** - 1 Review - 2 Memory Model for Program Execution - 3 The Stack - 4 Implementing Functions in C Using a Stack ### **Problem** # ■ How do we allocate memory during the execution of a program written in C? - Programs need memory for code and data such as instructions, global and local variables, etc. - Modern programming practices encourage many (reusable) functions, callable from anywhere. - Some memory can be statically allocated, since the size and type is known at compile time. - Some memory must be allocated dynamically, size and type is unknown at compile time. ### **Motivation** # ■ Why is memory allocation important? Why not just use a memory manager? - Allocation affects the performance and memory usage of every C, C++, Java program. - Current systems do not have enough registers to store everything that is required. - Memory management is too slow and cumbersome to solve the problem. - Static allocation of memory resources is too inflexible and inefficient, as we will see. ### Goals #### ■ What do we care about? - Fast program execution - Efficient memory usage - Avoid memory fragmentation - Maintain data locality - Allow recursive calls - Support parallel execution - Minimize resource allocation - Memory should never be allocated for functions that are not executed. ## Memory Model in the LC-3 # **Outline** - 1 Review - 2 Memory Model for Program Execution - 3 The Stack - 4 Implementing Functions in C Using a Stack # Stack: An Abstract Data Type - An important abstraction that you will encounter in many applications. - The fundamental model for execution of C, Java, Fortran, and many other languages. - We will describe two uses of the stack: - Evaluating arithmetic expressions - -Store intermediate results on stack instead of in registers - Function calls - Store parameters, return values, return address, dynamic link - Interrupt-Driven I/O - Store processor state for currently executing program ### Stack Data Structure ### ■ A LIFO (last-in first-out) storage structure - The first thing you put in is the last thing you take out - The last thing you put in is the first thing you take out - This means of access is what defines a stack, not the specific implementation. ### **■** Two main operations - PUSH: add an item to the stack - POP: remove an item from the stack #### **■** Error conditions: - Underflow (try to pop from empty stack) - Overflow (try to push onto full stack) - A register (eg. R6) holds address of top of stack (TOS) # **A Physical Stack** #### **■** Coin holder Last quarter in is the first quarter out (LIFO) ## **A Hardware Stack Implementation** ### ■ Data items move between registers # A Software Stack Implementation # ■ Data items don't move in memory, just our idea about where TOP of the stack is By convention, R6 holds the Top of Stack (TOS) Pointer (SP) # **Basic Push and Pop Code** Note: Stacks can grow in either direction (toward higher address or toward lower addresses) $2025/2/\overline{2}4$ ### **Pop with Underflow Detection** # ■ If we try to pop too many items off the stack, an underflow condition occurs. - Check for underflow by checking TOS before removing data. - Return status code in R5 (0 for success, 1 for underflow) ``` POP LD R1, EMPTY ADD R2, R6, R1 ; Compare stack pointer : with x4000 BRz UNDER LDR R0, R6, #0; The actual 'pop' ADD R6, R6, #1 ; Adjust stack pointer AND R5, R5, \#0; Success: return R5 = 0 RET UNDER AND R5, R5, #0 ; Underflow: return R5 = 1 ADD R5, R5, #1 RET EMPTY FILL xC000 : EMPTY = -x4000 ``` #### **Push with Overflow Detection** - If we try to push too many items onto the stack, an overflow condition occurs. - Check for underflow by checking TOS before adding data. - Return status code in R5 (0 for success, 1 for overflow) ``` PUSH LD R1, MAX ADD R2, R6, R1 ; Compare stack pointer BRz OVER ; with MAX ADD R6, R6, #-1 ; Adjust stack pointer STR R0, R6, #0 ; The actual 'push' AND R5, R5, #0 ; Success: return R5 = 0 RET OVER AND R5, R5, #0 ADD R5, R5, #1 ; Overflow: return R5 = 1 RET MAX .FILL xC005 ; MAX = -x3FFB ``` ### PUSH & POP in LC-3 - 1 ``` 01 ; 02; Subroutines for carrying out the PUSH and POP functions. This 03 ; program works with a stack consisting of memory locations x3FFF 04; through x3FFB. R6 is the stack pointer. 05; 06 POP AND R5,R5,#0 ; R5 <-- success 07 ST R1,Save1 ; Save registers that 08 ST R2, Save2 ; are needed by POP 090D ; LD R1,EMPTY : EMPTY contains -x4000 0B ADD R2,R6,R1 ; Compare stack pointer to x4000 0C BRz fail exit ; Branch if stack is empty OE. LDR R0,R6,#0 ; The actual "pop" ADD R6,R6,#1 ; Adjust stack pointer OF 10 BRnzp success exit 11 : 12 PUSH AND R5, R5, #0 13 ST R1,Save1 ; Save registers that 14 ST R2.Save2 ; are needed by PUSH ; FULL contains -x3FFB 15 LD R1, FULL 16 ADD R2,R6,R1 ; Compare stack pointer to x3FFB 17 BRz fail exit : Branch if stack is full 18 ; 4 D > 4 A > 4 B > 4 B > B ``` ### PUSH & POP in LC-3 - 2 ``` 19 ADD R6,R6,#-1 ; Adjust stack pointer 1A STR R0,R6,#0 ; The actual "push" 1B success exit LD R2, Save2; Restore original 1C LD R1, Save1 ; register values 1D RET 1E : 1F fail exit LD R2, Save2; Restore original 20 LD R1, Save1 ; register values ADD R5,R5,#1 ; R5 <-- failure 21 22 RET 23 : 24 EMPTY .FILL xC000 : EMPTY contains -x4000 25 FULL .FILL xC005 : FULL contains -x3FFB 26 Savel .FILL x0000 27 Save2 .FILL x0000 ``` # Arithmetic Using a Stack (p387, chapter 10.2) - ■Instead of registers, some ISA's use a stack for source and destination operations. The computer always pops and pushes operands from the stack, and hence no addresses need to be specified explicitly. Therefore, stack machines are sometimes referred to as zero-address machines. - Example: ADD instruction pops two numbers from the stack, adds them, and pushes the result to the stack. ``` ADD vs. ADD R0,R1,R2 ``` #### **■**Evaluating (A+B)•(C+D) using a stack: push A push B ADD push C push D ADD MULTIPLY pop result | x3FFB | |-------| | x3FFC | | x3FFD | | x3FFE | | x3FFF | | | x4000 SP ## $(25+17) \times (3+2)$ (a) Before (b) After first push (c) After second push ### $(25+17) \times (3+2)$ ## $(25+17) \times (3+2)$ # **Outline** - 1 Review - 2 Memory Model for Program Execution - 3 The Stack - 4 Implementing Functions in C Using a Stack ### **Function in C** - Smaller, simpler, subcomponent of program - Provides abstraction - hide low-level details - give high-level structure to programmer, easier to understand overall program flow - enables separable, independent development #### C functions - zero or multiple arguments passed in - single result returned (optional) - return value is always a particular type - In other languages, called procedures, subroutines, ... # **Example of High-Level Structure** ``` main() SetupBoard(); /* place pieces on board */ DetermineSides(); /* choose black/white */ /* Play game */ Structure of program is do { evident, even without knowing WhitesTurn(); implementation. BlacksTurn(); } while (NoOutcomeYet()); ``` ### **Functions in C** Declaration (also called prototype) **■ Function call** -- used in expression ### **Function Definition** ### ■ State type, name, types of arguments - must match function declaration - give name to each argument (doesn't have to match declaration) ``` int Factorial(int n) { int i; int result = 1; for (i = 1; i <= n; i++) result *= i; return result; }</pre> gives control back to calling function and returns value ``` ## Why Declaration? - Since function definition also includes return and argument types, why is declaration needed? - Use might be seen before definition. - Compiler needs to know return and arg types and number of arguments. - Definition might be in a different file, written by a different programmer. - include a "header" file with function declarations only - compile separately, link together to make executable # **Example** ``` double ValueInDollars(double amount, double rate); declaration main() { function call (invocation) dollars = ValueInDollars(francs, DOLLARS PER FRANC); printf("%f francs equals %f dollars.\n", francs, dollars); } definition double ValueInDollars (double amount, double rate) return amount * rate; ``` # **Storage Requirements** - Code must be stored in memory so that we can execute the function. - Parameters must be sent from the caller to the callee so that the function receives them. - Local variables for the function must be stored somewhere, is one copy enough? - Return address must be stored so that control can be returned to the caller. - Return values must be sent from the callee to the caller, that's how results are returned. # **Function Call in C** # **■** Consider the following code: ``` // main program Int a = 10; Int b = 20; Int c = foo(a,b); Int foo(int x,int y) { Int z; z= x+y; return z; } ``` #### ■ What needs to be stored? Code, parameters, local/global variables, return address/values # **Possible Solution: Mixed Code and Data** # **■** Function implementation: ``` foo BR foo begin ;skip over data .BLKW 1 foo rv ;return value foo ra .BLKW 1 :return address foo paramx .BLKW 1 ; 'x' parameter foo paramy .BLKW 1 ; 'y' parameter foo localz .BLKW 1 ; 'z' local foo begin ST R7, foo ra ;save return LD R7, foo ra restore return RET ``` ■ Can construct data section by appending foo # Corresponding to the option 1 in text book p.497 # **Possible Solution: Mixed Code and Data** # **■** Calling sequence ``` ST R1, foo_paramx ; R1 has 'x' ST R2, foo_paramy ; R2 has 'y' JSR foo ; Function call LD R3, foo_rv ; R3 = return value ``` - **■** Code generation is relatively simple. - Few instructions are spent on moving data. 2025/2/24 # Possible Solution: Mixed Code and Data # **■** Advantages: - Code and data are close together - Conceptually easy to understand - Minimizes register usage for variables - Data persists through life of program # **■** Disadvantages: - Cannot handle recursion or parallel execution - Code is vulnerable to self-modification - Consumes resource for inactive functions # **Possible Solution: Separate Code and Data** # **■** Memory allocation ``` foo rv .BLKW 1 : foo return value .BLKW 1 foo ra ; foo return address foo paramx .BLKW 1 ; foo 'x' parameter foo paramy .BLKW 1 ; foo 'y' parameter foo localz .BLKW 1 ; foo 'z' local bar rv .BLKW 1 : bar return value bar ra .BLKW 1 ; bar return address bar paramw .BLKW 1 ; bar 'w' parameter ``` - Code for foo() and bar() are somewhere else - **■** Function code call is similar to mixed solution 2025/2/24 # Possible Solution: Separate Code and Data # **■** Advantages: - Code can be marked 'read only' - Conceptually easy to understand - Early Fortran used this scheme - Data persists through life of program # **■** Disadvantages: - Cannot handle recursion or parallel execution - Consumes resource for inactive functions # Real Solution: Run-time Stack - Instead of allocating the space for local variables statically (i.e., in a fixed place in memory), the space is allocated once the function starts executing. - When the function returns to the caller, its space is reclaimed to be assigned later to another function. - If the function is called from itself, the new invocation of the function will get its own space that is distinct from its other currently active invocations. # **Run-time Stack: Stack frame** - Each function has a memory template where it stores its local variables, some bookkeeping information, and its parameter variables .This template is called its stack frame or activation record. - Whenever a function is called, its stack frame will be allocated somewhere in memory. - Because the calling pattern of functions naturally follows a stack-like pattern, this allocation and deallocation will follow the pushes and pops of a stack. #### Run-time Stack: stack-like nature of function calls # Run-time Stack: frame pointer & stack pointer - We need some easy way to access the data in each function's stack frame and also to manage the pushing and popping of stack frames. - For this, we will use R5 and R6. R5 points to some internal location within the stack frame at the top of the stack—it may point to the base of the local variables for the currently executing function. We call it the frame pointer (FP). - R6 always points to the very top of the stack. We call it the stack pointer (SP). Figure 14.5 Several snapshots of the run-time stack while the program outlined in Figure 14.4 executes. # Run-time Stack: Stack frame - Consider what has to happen in a function call: - Caller must pass parameters to the callee. - Caller must transfer control to the callee. - Caller need to allocate space for the return value. - Caller need to save the return address. - Callee requires space for local variables. - Callee must return control to the caller. - Callee need to save the frame pointer of the caller - So, parameters, return value, return address, frame pointer, and local variables are stored on the stack. # **Run-time Stack: Stack frame** Figure 14.7 The run-time stack after the stack frame for Volt is pushed onto the stack. # Run-time Stack: an example # **Run-time Stack: Stack frame** ``` int add(char* s_, int a, int b, int c, int d, int e, int f) { 00010458 <main>: return a+b+c+d+e+f: 10458 e92d4800 push {fp, lr} 1045c: e28db004 add fp, sp, #4 10460: e24dd020 sp. sp. #32 10464: e3a0300c r3, #12 mov int main() { 10468 e50b3014 r3, [fp, #-20] ; 0xffffffec 1046c: e3a03017 mov r3. #23 int x = 12: e50b3010 r3, [fp, #-16] 10470: str r3, [pc, #76] 10474 e59f304c ldr : 104c8 <main+0x70> r3, [fp, #-12] 10478: e50b300c int out = add( s, x, y, 1, 1, 1, 1); 1047c: e3a03001 r3, #1 10480 e58d3008 r3, [sp. #8] printf("%d\n", out); 10484: e3a03001 r3, #1 mov 10488: e58d3004 r3, [sp. #41 str 1048c: e3a03001 r3. #1 mov 10490 e58d3000 r3. [sp] 10494 e3a03001 mov r3. #1 r2, [fp, #-16] 10498: e51b2010 1049c: e51b1014 ldr r1, [fp, #-20] : 0xffffffec r0, [fp, #-12] 104a0 e51b000c 1dr 104a4: ebffffd5 10400 <add> 00010400 <add>: e50b0008 104a8: r0, [fp, #-8] 10400 e52db004 ; (str fp, [sp, #-4]! str 104ac: e51b1008 r1, [fp, #-8] 10404 e28db000 add fp, sp, #0 104b0: e59f0014 r0, [pc, #20] : 104cc <main+0x74> 10408 e24dd014 sp. sp. #20 sub 104b4: ebffff89 102e0 <printf@plt> 1040c: e50b0008 r0, [fp, #-8] 104b8: e3a03000 mov r3. #0 10410: e50b100c r1, [fp, #-12] str 104bc: e1a00003 r0, r3 10414 e50b2010 r2, [fp, #-16] mov str 104c0: e24bd004 sp, fp, #4 10418: e50b3014 str r3, [fp, #-20] : 0xffffffec sub 104c4: e8bd8800 DOD {fp, pc} 1041c: e51b200c ldr r2, [fp, #-12] 104c8: 00010540 -word 0x00010540 10420: e51b3010 ldr r3, [fp, #-16] 104cc: 00010554 0400010554 .word 10424: e0822003 add r2, r2, r3 10428: e51b3014 r3, [fp, #-20] ; 0xffffffec 1042c: e0822003 add r2, r2, r3 10430: e59b3004 ldr r3, [fp, #4] Contents of section .rodata: 10434: e0822003 add r2, r2, r3 1053c 01000200 68656865 2c206861 68616861 ....hehe, hahaha 10438: e59b3008 r3, [fp, #8] 1043c: e0822003 r2, r2, r3 add 1054c 68612121 21210000 25640a00 ha!!!!..%d.. r3, [fp, #12] 10440 e59b300c ldr 10444: e0823003 add r3, r2, r3 10448 e1a00003 mov r0, r3 1044c: e28bd000 add sp. fp. #0 e49db004 pop ; (ldr fp, [sp], #4) ``` el2fffle 10454 ### **Run-time Stack: Stack frame** ``` 00010400 <myadd>: 10400 ; (str fp, [sp, #-4]!) e52db004 push 10404: e28db000 add fp. sp. #0 sp, sp, #12 10408: e24dd00c sub r0, [fp, #-8] 1040c : e50b0008 str #include <stdio.h> 10410: e50b100c str r1, [fp, #-12] r2. [fp, #-8] 10414: e51b2008 ldr 10418 r3, [fp, #-12] e51h300c 1dr 1041c: e0823003 add r3, r2, r3 int myadd(int c, int d) { 10420: e1a00003 r0. r3 mov 10424+ e28bd888 add sp. fp. #0 10428: e49db004 pop ; (ldr fp, [sp], #4) 1042c: e12fff1e 00010430 <add>: 10430: e92d4800 {fp, lr} oush int add(int a, int b) { 10434: e28db004 add fp. sp. #4 10438: e24dd008 sub sp. sp. #8 return myadd(a,b); r0, [fp, #-8] 1043c: e50b0008 str 10440 e50b100c r1, [fp, #-12] str 10444 - e51b100c 1dr rl. [fp. #-12] 10448: r0, [fp. #-8] e51b0008 1dr 1044c: ebffffeb 10400 <myadd> bl 10450: e1a03000 г3. г0 int main() { 10454: e1a00003 mov r0. r3 10458: e24bd004 sp, fp, #4 sub 1045c: e8bd8800 {fp, pc} DOD int x = 12: 00010460 <main>: int y = 23; 10460 e92d4800 (fp. lr) push 10464: e28db004 add fp. sp. #4 int out = add(x, y); 10468: e24dd010 sub sp. sp. #16 1046c: r3, #12 printf("%d\n", out); e3a0300c mov 18478 e50b3010 str r3, [fp, #-16] return 0: 19474 - e3a03017 mov r3, #23 10478: e50b300c r3, [fp, #-12] str 1047c: e51b100c ldr r1, [fp, #-12] 10480: e51b0010 r0. [fp. #-16] 10484 10430 <add> ebffffe9 10488: e50b0008 r0, [fp, #-8] str 1048c: e51b1008 rl. [fp. #-81 10490: e59f0010 ldr r0, [pc, #16] : 104a8 <main+0x48> 10494: ebffff91 102e0 <printf@plt> 10498: e3a03000 mov r3. #0 1049c: e1a00003 r0, r3 mov 104a0: e24bd004 sub sp. fp. #4 ``` 104a4: 104a8: e8bd8800 DOD ifp, pcl # 计算机系统概论 **Introduction to Computing Systems** (CS1002A.02) 计算机科学与技术学院 School of Computer Science and Technology - 1 Recursion - 2 The Queue - 3 Character Strings #### 1. Recursion - Recursion is a mechanism for expressing a function *in terms of itself*. - When used appropriately, the expressive power of recursion is going to save us a lot of headaches - Otherwise, it results in longer execution time and wasted energy #### Factorial: compute n!=n\*(n-1)!, A BAD EXAMPLE Figure 8.12 Flowchart for a recursive FACTORIAL subroutine. Can it work properly? ### **Expected execution flow** Figure 8.12 Flowchart for a recursive FACTORIAL subroutine. Figure 8.13 Execution flow for recursive FACTORIAL subroutines. #### Factorial: compute n!=n\*(n-1)! #### Factorial: compute n!=n\*(n-1)! ### Factorial: compute n!=n\*(n-1)! | FACT | ADD R6,R6,#-1<br>STR R1,R6,#0 | ; Push Caller's R1 on the stack, so we | 0.620.11 | co P1 | | |-----------|-------------------------------|------------------------------------------|----------|-------------|-----------------| | | 31K K1,K0,#0 | , Fusii Callei S KT Off the Stack, so we | e can u | se KI. | | | | ADD R1,R0,#-1 | ; If $n=1$ , we are done since $1! = 1$ | | | | | | BRz NO RECURSE | | | | | | | - | | | Example: | | | | ADD R6,R6,#-1 | | | R0=n=3 | | | | STR R7,R6,#0 | ; Push return linkage onto stack | | | | | | ADD R6,R6,#-1 | • | | | | | | STR R0,R6,#0 | ; Push n on the stack | → | | | | | | To | op 🗦 | n=2 | n-1-1=1 → R0,R1 | | | ADD R0,R0,#-1 | ; Form n-1, argument of JSR | | | | | В | JSR FACT | | | B+1 | | | | LDR R1,R6,#0 | ; Pop n from the stack | | DTI | | | | ADD R6,R6,#1 | | | | | | | MUL R0,R0,R1 | ; form n*(n-1)! | | #1's R1=2 | | | | | | | n=3 | | | | LDR R7,R6,#0 | ; Pop return linkage into R7 | | 11=5 | | | | ADD R6,R6,#1 | | | A+1 | | | NO_RECURS | SE LDR R1,R6,#0 | ; Pop caller's R1 back into R1 | | ATI | | | | ADD R6,R6,#1 | | | | | | | RET | | | Caller's R1 | | | | | | | | | | FACT | ADD R6,R6,#-1 | | | | | | |-----------|----------------|----------------------------------------------------|-------------|-----------|--|--| | | STR R1,R6,#0 | ; Push Caller's R1 on the stack, so we can use R1. | | | | | | | ADD R1,R0,#-1 | ; If n=1, we are done since 1! = 1 | | | | | | | BRz NO RECURSE | , if it = 1, we are done since it. = 1 | | | | | | | DIZ NO_RECORSE | | Example: | | | | | | ADD R6,R6,#-1 | ı | R0=n=3 | | | | | | STR R7,R6,#0 | ; Push return linkage onto stack | | | | | | | ADD R6,R6,#-1 | - | #2's R1=1 | | | | | | STR R0,R6,#0 | ; Push n on the stack | | | | | | | | | n=2 | R0=1 R1=2 | | | | В | ADD R0,R0,#-1 | ; Form n-1, argument of JSR | B+1 | | | | | | JSR FACT | | | | | | | | LDR R1,R6,#0 | ; Pop n from the stack | D+1 | | | | | | ADD R6,R6,#1 | · | | | | | | | MUL R0,R0,R1 | ; form n*(n-1)! | #1's R1=2 | | | | | | | | n=3 | | | | | | LDR R7,R6,#0 | ; Pop return linkage into R7 | 11=3 | | | | | | ADD R6,R6,#1 | - | 4.4 | | | | | NO_RECURS | E LDR R1,R6,#0 | ; Pop caller's R1 back into R1 | A+1 | | | | | | ADD R6,R6,#1 | | | | | | | | RET | | Caller's R1 | | | | | | | · · | | | | | | FACT | ADD R6,R6,#-1<br>STR R1,R6,#0 | ; Push Caller's R1 on the stack, so we can use R1. | | | | |--------------------------------------|---------------------------------|----------------------------------------------------|-------------|----------------|--| | | ADD R1,R0,#-1<br>BRz NO RECURSE | ; If n=1, we are done since 1! = 1 | | | | | | BKZ NO_RECORSE | | Example: | | | | | ADD R6,R6,#-1 | | R0=n=3 | | | | | STR R7,R6,#0 | ; Push return linkage onto stack | | | | | | ADD R6,R6,#-1 | | #2's R1=1 | | | | | STR R0,R6,#0 | ; Push n on the stack | | <b>5.</b> 6 | | | | | | n=2 | R1=2 | | | | ADD R0,R0,#-1 | ; Form n-1, argument of JSR | | | | | В | JSR FACT | | B+1 | R0=R0*R1=1*2=2 | | | | LDR R1,R6,#0 | ; Pop n from the stack | | | | | | ADD R6,R6,#1 | Top → | #1's R1=2 | | | | | MUL R0,R0,R1 | ; form n*(n-1)! | #15K1-2 | | | | | LDR R7,R6,#0 | Dan vatura linkana inta D7 | n=3 | | | | | ADD R6,R6,#1 | ; Pop return linkage into R7 | | | | | NO_RECURSE LDR R1,R6,#0 ADD R6,R6,#1 | | ; Pop caller' s R1 back into R1 | A+1 | | | | | | , rop caner 3 Kr back into Kr | | | | | | RET | | Caller's R1 | | | | FACT | ADD R6,R6,#-1 | | | | |-----------|-----------------|-----------------------------------------------|-------------|----------------| | | STR R1,R6,#0 | ; Push Caller's R1 on the stack, so we can us | se R1. | | | | ADD R1,R0,#-1 | ; If n=1, we are done since 1! = 1 | | | | | BRz NO RECURSE | | | | | | _ | | Example: | | | | ADD R6,R6,#-1 | | R0=n=3 | 1 | | | STR R7,R6,#0 | ; Push return linkage onto stack | | | | | ADD R6,R6,#-1 | | #2's R1=1 | | | | STR R0,R6,#0 | ; Push n on the stack | | | | | | | n=2 | | | | ADD R0,R0,#-1 | ; Form n-1, argument of JSR | | | | В | JSR FACT | | B+1 | R0=R0*R1=1*2=2 | | | LDR R1,R6,#0 | ; Pop n from the stack | D+1 | | | | ADD R6,R6,#1 | | | D4 0 | | | MUL R0,R0,R1 | ; form n*(n-1)! | #1's R1=2 | R1=2 | | | | Ton 3 | n=3 | | | | LDR R7,R6,#0 | ; Pop return linkage into R7 | 11-3 | | | | ADD R6,R6,#1 | | A . 4 | | | NO_RECURS | SE LDR R1,R6,#0 | ; Pop caller's R1 back into R1 | A+1 | | | | ADD R6,R6,#1 | | | | | | RET | | Caller's R1 | | | | | | | J | | FACT | ADD R6,R6,#-1<br>STR R1,R6,#0 | ; Push Caller's R1 on the stack, so we ca | ın iis | e R1 | | |-----------|-------------------------------|----------------------------------------------|----------|-------------|----------------| | | 5111111111011110 | | ۵5 | · · · · · | | | | ADD R1,R0,#-1 | ; If $n=1$ , we are done since $1! = 1$ | | | | | | BRz NO_RECURSE | | | | | | | | | | Example: | | | | ADD R6,R6,#-1 | | ı | R0=n=3 | ı | | | STR R7,R6,#0 | ; Push return linkage onto stack | | | | | | ADD R6,R6,#-1 | | | #2's R1=1 | | | | STR R0,R6,#0 | ; Push n on the stack | - 1 | | | | | | | | n=2 | | | | ADD R0,R0,#-1 | ; Form n-1, argument of JSR | 1 | | | | В | JSR FACT | | | B+1 | R0=R0*R1=1*2=2 | | | LDR R1,R6,#0 | ; Pop n from the stack | | D+1 | | | | ADD R6,R6,#1 | • | | | | | | MUL R0,R0,R1 | ; form n*(n-1)! | | #1's R1=2 | | | | | | | | R1=3 | | | LDR R7,R6,#0 | ; Pop return linkage into R7 | _ | n=3 | 1(1-5 | | | ADD R6,R6,#1 | | ı | | <b>→</b> | | NO RECURS | SE LDR R1,R6,#0 | ; Pop caller's R1 back into R1 | <b>→</b> | A+1 | R0=R0*R1=2*3=6 | | | ADD R6,R6,#1 | , sp. 11 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 | ı | | | | | RET | | | Caller's R1 | | | | | | | | | | STR R1,R6,#0 ; Push Caller's R1 on the stack, so we can use R1. | | |-----------------------------------------------------------------|----------------| | | | | ADD R1,R0,#-1 ; If $n=1$ , we are done since $1! = 1$ | | | BRz NO_RECURSE | | | Example: | | | ADD R6,R6,#-1 | | | STR R7,R6,#0 ; Push return linkage onto stack | | | ADD R6,R6,#-1 #2's R1=1 | | | STR R0,R6,#0 ; Push n on the stack | | | n=2 | | | ADD R0,R0,#-1 ; Form n-1, argument of JSR | | | B JSR FACT R0= | 0=R0*R1=1*2=2 | | LDR R1,R6,#0 ; Pop n from the stack | | | ADD R6,R6,#1 | | | MUL R0,R0,R1 ; form n*(n-1)! #1's R1=2 | | | n=3 F | R1=3 | | LDR R7,R6,#0 ; Pop return linkage into R7 | | | ADD R6,R6,#1 | R0=R0*R1=2*3=6 | | NO_RECURSE LDR R1,R6,#0 ; Pop caller' s R1 back into R1 | NO-NO N1-2 3-0 | | ADD R6,R6,#1 | | | RET Top→ Caller's R1 | | | FACT | ADD R6,R6,#-1<br>STR R1,R6,#0 | ; Push Caller's R1 on the stack, so we ca | an use R1. | | |---------|-------------------------------|----------------------------------------------|-------------|----------------| | | ADD R1,R0,#-1 | ; If n=1, we are done since 1! = 1 | | | | | BRz NO_RECURSE | | Example: | | | | ADD R6,R6,#-1 | | R0=n=3 | | | | STR R7,R6,#0 | ; Push return linkage onto stack | | | | | ADD R6,R6,#-1 | | #2's R1=1 | | | | STR R0,R6,#0 | ; Push n on the stack | | | | | | | n=2 | | | | ADD R0,R0,#-1 | ; Form n-1, argument of JSR | | | | В | JSR FACT | | B+1 | R0=R0*R1=1*2=2 | | | LDR R1,R6,#0 | ; Pop n from the stack | D+1 | | | | ADD R6,R6,#1 | | | | | | MUL R0,R0,R1 | ; form n*(n-1)! | #1's R1=2 | | | | | | | R1=3 | | | LDR R7,R6,#0 | ; Pop return linkage into R7 | n=3 | 1/1-3 | | | ADD R6,R6,#1 | , ,, ,, | | | | NO RECU | RSE LDR R1,R6,#0 | ; Pop caller's R1 back into R1 | A+1 | R0=R0*R1=2*3=6 | | | ADD R6,R6,#1 | , sp. 11 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 | | | | | RET | | Caller's R1 | R1=old value | | | | | | | | | | Ton | _ | | Top → Return to calling program ### Implementing FACT iteratively (without recursion) FACT ST R1,SAVE\_R1 ADD R1,R0,#0 ADD R0,R0, #-1 **BRz DONE** AGAIN MUL R1,R1,R0 ADD R0,R0,#-1 BRnp AGAIN DONE ADD R0,R1,#0 LD R1,SAVE\_R1 RET SAVE\_R1 .BLKW 1 ; R1=R0=n ; R0=n-1 ; (n\*(<mark>n-1</mark>))\*(<mark>n-2</mark>)... ; R0 gets next integer for MUL ; Move n! to R0 ### The Maze: a Good Example Given a maze and a starting position within the maze, write a program that determines whether or not there is a way out of the maze from your starting position. A Maze A maze can be any size, n by m. For example, Figure 8.20 illustrates a 6x6 maze Figure 8.20 Example of a maze #### Specification of each cell in the maze - Bit[4]=1 if there is a door to the outside world; Bit[4]=0 if no door. - Bit[3]=1 if there is a door to the cell to the north; Bit[3]=0 if no door. - Bit[2]=1 if there is a door to the cell to the east; Bit[2]=0 if no door. - Bit[1]=1 if there is a door to the cell to the south; Bit[1]=0 if no door. - Bit[0]=1 if there is a door to the cell to the west; Bit[0]=0 if no door. ## **Specification of the maze** 2A .END Figure 8.22 Specification of the maze of Figure 8.20. ### The Maze: a searching algorithm ### **Description of the Algorithm** - a. From our cell, we ask if we can exit. If yes, we are done. We exit with R1=1. - b. If not, we put a **breadcrumb** in our cell. Our **breadcrumb** is bit [15] of the word corresponding to our current cell. We set it to 1. - c. We ask two questions: Is there a door to the north, and have we never visited the cell to the north before? If the answer to both is yes, we set the address to the cell to the north, and JSR FIND\_EXIT. We set the address to - d. If the answer to either question is no, or if going north resulted in failure, we ask: Is there a door to the east, and have we never visited that cell before? If the answer to both is yes, we set the address to the address of the cell to the east (by adding 1 to the address) and JSR FIND\_EXIT. - e. If going east does not get us out, we repeat the question for south, and if that does not work, then for west. - f. If we end up with no door to the west to a cell we have not visited, or if there is a door and we haven't visited, but it results in failure, we are done. We cannot exit the maze from our starting position. We set R1=0 and return. #### **Recursive Subroutine to exit the Maze** | ; Recursive subroutine that determines if there is | 18 ; check the north cell for a path to exit | | | |------------------------------------------------------------|----------------------------------------------------|--|--| | ; a path from current cell to the outside world. | 19 CHECK_NORTH LD R7, NORTH_MASK | | | | ; input: R0, current cell address | 1A AND R7, R2, R7 | | | | ; output: R1, YES (1) or NO (0) | 1B BRz CHECK_EAST; If north is blocked, check east | | | | | 1C LDR R7, R3, #-6 | | | | .ORIG x4000 | 1D BRn CHECK_EAST; If a breadcrumb in the north | | | | 01 FIND EXIT; save modified registers into the stack. | ;cell, check east. bit[15]=1, negative | | | | 02 ADD R6, R6, #-1 | 1E ADD R0, R3, #-6 | | | | 03 STR R2, R6, #0; R2 holds the cell data of the caller | 1F JSR FIND_EXIT; Recursively check the north cell | | | | 04 ADD R6, R6, #-1 | 20 ADD R1, R1, #0 | | | | 05 STR R3, R6, #0; R3 holds the cell address of the caller | 21 BRp DONE_YES; If a path from north cell found, | | | | 06 ADD R6, R6, #-1 | 22 ;return YES | | | | 07 STR R7, R6, #0; R7 holds the PC of the caller | 23 ; check the north cell for a path to exit | | | | 08 | 24 CHECK_EAST LD R7, EAST_MASK | | | | 09 ; Move cell address to R3, since we need to use R0 | 25 AND R7, R2, R7 | | | | 0A ; as the input to recursive subroutine calls. | 26 BRz CHECK_SOUTH; If the way to east is | | | | 0B ADD R3, R0, #0 | ; blocked, check south | | | | 0C | 27 LDR R7, R3, #1 | | | | 0D ; If the exit is in this cell, return YES | 28 BRn CHECK_SOUTH ; If a breadcrumb in the | | | | 0E LDR R2, R0, #0; R2 now holds the current cell data | ; east cell, check south | | | | OF LD R7, EXIT_MASK; EXIT_MASK .FILL x0010 | 29 ADD R0, R3, #1 | | | | 10 AND R7, R2, R7 | 2A JSR FIND_EXIT; Recursively check the east cell | | | | 11 BRnp DONE_YES | 2B ADD R1, R1, #0 | | | | 12 | 2C BRp DONE_YES; If a path from east cell found, | | | | 13 ; Put breadcrumb in the current cell. | ; return YES, | | | | 14 LD R7, BREADCRUMB; BREADCRUMB.FILL x8000 | | | | | 15 ADD R2, R2, R7 | | | | | 16 STR R2, R0, #0 | | | | | 17 | | | | | 2E ; che | ck the south cell for a path to exit | 46 | | | |----------------------------------|-------------------------------------------------|-------------------------------------------------|--|--| | 2F CHECK_SOUTH LD R7, SOUTH_MASK | | 47 DONE YES AND R1, R1, #0 | | | | 30 | AND R7, R2, R7 | 48 ADD R1, R1, #1 | | | | 31 | BRz CHECK_WEST; If the way to south is blocked, | 49 | | | | | ;check west | 4A RESTORE ADD R0, R3, #0; restore R0 from R3 | | | | 32 | LDR R7, R3, #6 | ; | | | | 33 | BRn CHECK_WEST; If a breadcrumb in the south | 4B ; restore the rest of the modified registers | | | | | ;cell, check west | ;from the stack. | | | | 34 | ADD R0, R3, #6 | 4C LDR R7, R6, #0 | | | | 35 | JSR FIND_EXIT; Recursively check the south cell | 4D ADD R6, R6, #1 | | | | 36 | ADD R1, R1, #0 | 4E LDR R3, R6, #0 | | | | 37 | BRp DONE_YES; If a path from south cell found, | 4F ADD R6, R6, #1 | | | | | ;return YES | 50 LDR R2, R6, #0 | | | | 38 | | 51 ADD R6, R6, #1 | | | | | ck the west cell for a path to exit | 52 RET | | | | | CK_WEST LD R7, WEST_MASK | 53 | | | | 3B | AND R7, R2, R7 | 54 BREADCRUMB .FILL x8000 | | | | 3C | BRz DONE_NO; If the way to west is blocked, | 55 EXIT MASK FILL x0010 | | | | | ; return NO | 56 NORTH MASK .FILL x0008 | | | | 3D | LDR R7, R3, #-1 | 57 EAST MASK .FILL x0004 | | | | 3E | BRn DONE_NO; If a breadcrumb in the west cell, | 58 SOUTH MASK .FILL x0002 | | | | | ; return NO | 59 WEST MASK .FILL x0001 | | | | 3F | ADD R0, R3, #-1 | <del>-</del> | | | | 40 | JSR FIND_EXIT ; Recursively check the west cell | 5A .END | | | | 41 | ADD R1, R1, #0 | | | | | 42 | BRp DONE_YES ; If a path from west cell found, | | | | | 44 DON | ; return YES43 | | | | | 44 DUN<br>45 | IE_NO AND R1, R1, #0 RR RESTORE | | | | | | | | | | #### 2.The Queue - **■**Definition of Queue - **■**Basic Operations - **■Wrap-Around** - **■**Definition of full and empty queue - **■**Underflow/Overflow #### 2.1 The Definition of Queue #### **■** Queue - A data structure with the property of "First in First out (FIFO)"; - Front pointer for removing elements from the front of the queue; Rear pointer for inserting into the rear of the queue. - FRONT points to the location just in front of the first element in the queue; REAR points to the location of the last element in the queue. Figure 8.25 A queue allocated to memory locations x8000 to x8005. #### 2.2 Basic Operations #### ■ Remove from Front - FRONT points to the location just in front of the first element in the queue; R3 stores the FRONT pointer; - First incrementing FRONT; then loading the value. ADD R3,R3,#1 LDR R0,R3,#0 #### ■ Insert at Rear First incrementing REAR; then storing the value. R4 stores the REAR pointer; > ADD R4,R4,#1 STR R0,R4,#0 ←□ → ←□ → ← □ → #### 2.3 Wrap-Around #### Insert ■ Remove ;R3 stores FRONT pointer ;R4 stores REAR pointer LD R2, LAST LD R2, LAST ADD R2,R3,R2 ADD R2,R4,R2 BRnp SKIP 1 BRnp SKIP 1 LD R3,FIRST LD R4,FIRST BR SKIP 2 BR SKIP 2 **SKIP 1 ADD R4,R4,#1 SKIP 1 ADD R3,R3,#1 SKIP 2 STR R0,R4,#0 SKIP 2 LDR R0,R3,#0** ; R0 gets the front of the queue ; R0 gets the front of the queue **RFT RFT** LAST .FILL 7FFB LAST .FILL x7FFB ; LAST contains the negative of 8005 ; LAST contains the negative of 8005 .FILL x8000 FIRST FIRST FILL x8000 x 8000 45 x 8000 REAR 17 x 800 1 17 x8001 x8002 23 FRONT x 8003 x 8004 x8004 x 8005 x 8005 x8002 x8002 x8000 x8005 #### 2.4 Full and Empty Queue - The queue are allowed to store only n-1 (why?) elements for a queue with n locations. - Full: FRONT=REAR+1 OR FRONT + n -1 = REAR - **Empty: FRONT=REAR** Figure 8.26 A full queue and an empty queue. #### 2.5 Tests for Underflow, Overflow R5=1: underflow R5=0: normal R3: FRONT R4: REAR ``` ■ Test for underflow FRONT =? REAR ``` AND R5,R5,#0 ; Initialize R5 to 0 NOT R2,R3 ADD R2,R2,#1; R2 contains negative of R3 ADD R2,R2,R4 BRz UNDERFLOW ; R3 = R4 ; code to remove the front of the ;queue and return success. **RFT** **UNDERFLOW ADD R5,R5,#1** RFT #### ■ Test for overflow How? #### 2.6 The Complete Story ``` 00 ;Input: R0 for item to be inserted, R3 is FRONT, R4 is REAR 18 SKIP4 ADD R5.R5.#1: R5=FAILURE 19 BR DONE 01; Output: R0 for item to be removed 1A: 02; 1B REMOVE ST R1, SaveR1 ; Save register we need 03 INSERT ST R1, SaveR1; Save register we need 1C AND R5,R5,#0; Set R5 to success code 04 AND R5,R5,#0 ; Set R5 to success code 1D: Initialization complete 05 ; Initialization complete 1F NOT R1.R4 06 LD R1.NEG LAST 1F ADD R1,R1,#1; R1= NEG REAR 07 ADD R1,R1,R4; R1 = REAR MINUS x8005 20 ADD R1,R1,R3; R1 = FRONT-REAR BRnp SKIP1 : SKIP WRAP AROUND 08 21 BRz EMPTY 09 LD R4,FIRST ; WRAP AROUND, R4=x8000 22 LD R1, NEG LAST 0A BR SKIP2 23 ADD R1,R1,R3; R1 = FRONT MINUS x8005 24 BRnp SKIP5 OB SKIP1 ADD R4.R4.#1 : NO WRAP AROUND, R4=R4+1 OC SKIP2 NOT R1,R4 25 LD R3, FIRST; R3=x8000 26 BR SKIP6 UD ADD R1.R1.#1 : R1= NEG REAR 27 SKIP5 ADD R3,R3,#1; R3=R3+1 ΩF ADD R1.R1.R3 : R1= FRONT-REAR 28 SKIP6 LDR R0,R3,#0; DO THE REMOVE 0F BRz FULL 29 BR DONE 10 STR R0.R4.#0 : DO THE INSERT 2A EMPTY ADD R5,R5.#1; R5=FAILURE 11 BR DONE 2B DONE LD R1, SaveR1; Restore register 12 FULL LD R1, NEG FIRST; to decrement R4 20 RET 13 ADD R1.R1.R4 : R1 = REAR MINUS x8000 2D FIRST .FILL x8000 14 BRnp SKIP3 2E NEG FIRST .FILL x8000 15 LD R4.LAST : UNDO WRAP AROUND, REAR=x8005 2F LAST .FILL x8005 16 BR SKIP4 30 NEG LAST .FILL x7FFB 31 SaveR1 .BLKW 1 17 SKIP3 ADD R4,R4,#-1; NO WRAP AROUND, R4=R4-1 ``` #### 3. Character Strings - one-dimensional array of ASCII codes often followed by x0000 (null character). - Each location stores an ASCII codes of a character. Figure 8.28 Character string representing the name "Bill Linvill." #### **Example 1: Personnel Record** #### ■ 6 words of sequential memory starting at x4000: - 1. The first word contains the starting address of a character string containing the person's **last** name. The pointer in location x4000 is the address x6000. The six-word character string, starting at x6000, contains the ASCII code for "Jones," terminated with the null character. - 2. The second word, at x4001, contains a pointer to the character string of the person's **first** name, in this case "Mary," starting at location x4508. - 3. The third word, at x4002, contains a pointer (xCA9B) to her **nine-digit social security number**, the unique identifier for all persons working in the United States. - 4. The fourth word, at x4003, contains her salary (in thousands of dollars). - 5. The fifth word contains **how long** she has worked for the company. - 6. The sixth word is a pointer (x8E25) to the character string identifying her job title, "Engineer." #### Subroutine to compare two character strings #### ■ Given social security number in example 1, how to look for the salary? ``` STRCMP ST RO.SaveRO ST R1.SaveR1 ST R2, SaveR2 ST R3.SaveR3 ;R0 points to 1st string, R1 points to 2nd string AND R5,R5,#0; R5 <-- Match; NEXTCHAR LDR R2.R0.#0 : R2 contains character from 1st string LDR R3.R1.#0 ; R3 contains character from 2nd string BRnp COMPARE ; String is not done, continue comparing ADD R2,R2,#0 BR7 DONE : If both strings done, match found COMPARE NOT R2,R2 ADD R2.R2.#1 ; R2 contains negative of character ADD R2,R2,R3 ; Compare the 2 characters ``` ``` BRnp FAIL: Not equal, no match ADD R0,R0,#1 ADD R1,R1,#1 BRnzp NEXTCHAR ; Move on to next pair of characters FAII ADD R5.R5.#1: R5 <-- No match DONE LD RO, SaveRO LD R1.SaveR1 LD R2.SaveR2 LD R3, SaveR3 RFT SaveR0_BLKW 1 SaveR1 .BLKW 1 SaveR2 .BLKW 1 SaveR3 BLKW 1 ``` #### Represent a long integer of any length by character strings, ensuring all characters are within 0-9. ``` ; Input: R0 contains the starting address of the character string ; Output: R5=0, success; R5=1, failure. TEST INTEGER ST R1.SaveR1 ; Save registers needed by subroutine ST R2.SaveR2 ST R3.SaveR3 ST R4,SaveR4 AND R5.R5.#0: Initialize success code to R5=0, success LD R2, ASCII 0; R2=xFFD0, the negative of ASCII code x30 LD R3.ASCII 9; R3=xFFC7, the negative of ASCII code x39 NEXT CHAR LDR R1.R0.#0 : Load next character BRz SUCCESS : if current character is null ADD R4,R1,R2 ;R1 - 0x30h BRn BAD; R1 is less than x30, not a decimal digit ADD R4.R1.R3 :R1 - 0x39h BRp BAD; R1 is greater than x39, not a decimal digit ADD RO,RO,#1; Character good! Prepare for next character BR NEXT CHAR ``` ``` BAD ADD R5,R5,#1; R5 contains failure code SUCCESS LD R4,SaveR4; Restore registers LD R3,SaveR3 LD R2,SaveR2 LD R1,SaveR1 RET ASCII_0 .FILL xFFD0 ASCII_9 .FILL xFFC7 SaveR1 .BLKW 1 SaveR3 .BLKW 1 SaveR4 .BLKW 1 SaveR4 .BLKW 1 ``` Figure 8.31 A character string representing the integer 79,245, with one ASCII code per decimal digit. # 计算系统概论A Introduction to Computing Systems ( CS1002A.02 ) #### 计算机科学与技术学院 School of Computer Science and Technology # **Outline** - 1 Review - **2** The Memory Address Space - 3 Input/Output # **Outline** - 1 Review - **2** The Memory Address Space - 3 Input/Output ### **Review** ### ■ So far, we' ve learned how to: - compute with values in registers - load data from memory to registers - store data from registers to memory # **Today: I/O in Von Neumann Model** # **Outline** - 1 Review - 2 The Memory Address Space - 3 Input/Output # **Privilege and Priority** - Two very different concepts associated with computer processing are privilege and priority. - Privilege is all about the right to do something, such as execute a particular instruction or access a particular memory location. Not all computer programs have the right to execute all instructions. - We say a program is executing in Supervisor mode to indicate privileged, or User mode to indicate unprivileged. - Priority is all about the urgency of a program to execute. - allows programs of greater urgency to interrupt programs of lesser urgency. - privilege and priority are two orthogonal notions - They have nothing to do with each other. # The Processor Status Register (PSR) - Each program executing on the computer has associated with it two very important registers. - The Program Counter (PC) - and the Processor Status Register (PSR) which contains the privilege and priority assigned to that program. - PSR[15]=0 means supervisor privilege, and PSR[15]=1 means unprivileged. - Bits [10:8] specify the priority level (PL) of the program. The highest priority level is 7 (PL7), the lowest is PL0. - The PSR also contains the current values of the condition codes # **Organization of Memory** - LC-3 has a 16-bit address space - memory locations from x0000 to xFFFF. - Locations x0000 to x2FFF are privileged memory locations. 0x3000 0x3000 - They contain the various data structures and code of the operating system. - They require supervisor privilege to access. - They are referred to as system space. - The supervisor stack is controlled by the operating system. # **Organization of Memory** locations. Locations x3000 to xFDFF are unprivileged memory Supervisor privilege is not required to access these memory locations. All user programs and data use this region of memory. The region is often referred to as user space. The user stack is controlled by the user program and does not require privilege to access. # **Organization of Memory** - Addresses xFE00 to xFFFF do not correspond to memory locations - The last address of a memory location is xFDFF. - The set of addresses from xFE00 to xFFFF is usually referred to as the I/O page since most of the addresses are used for identifying registers that take part in input or output functions. - The set of addresses are part of privileged memory address space and accessible only to programs that have supervisor privilege. #### **Organization of Memory** - For the two stacks, each has a stack pointer, Supervisor Stack Pointer (SSP) and User Stack Pointer (USP), to indicate the top of the stack. - Since a program can only execute in Supervisor mode or User mode at any one time, only one of the two stacks is active at any one time. - Two registers, Saved\_SSP and Saved\_USP, are provided to save the SP not in use. - When privilege changes, for example, from Supervisor mode to User mode, the SP is stored in Saved\_SSP, and the SP is loaded from Saved\_USP. # **Outline** - 1 Review - **2** The Memory Address Space - 3 Input/Output ## Input / Output (I/O) - Computer systems are useless unless they can process information from outside of the computer and output results outside of the computer - But where does data in memory come from? - And how does data get out of the system? - I/O is effective communication with the outside of the computer - I/O device itself communicates with outside world, e.g., keyboard takes input from user - Computer needs to communicate with I/O device, e.g., computer takes input from keyboard - Communication through shared memory locations - Processor and I/O can read/write those memory locations - Sometimes, data in memory locations can be set/cleared automatically (by hardware) depending on a read/write # I/O: Connecting to the Outside World #### ■ I/O Examples - Keyboard/mouse input, video output on a standard computer - Network input/output that enables web surfing - Information from an engine of a car that a computer uses to determine how to tune the engine (output from computer tunes the engine) - Requests for airline reservations and replies that service those requests 2025/2/24 # I/O: Connecting to the Outside World #### ■ Types of I/O devices characterized by: - behavior: input, output - input: keyboard, motion detector, network interface - output: display screen(monitor), printer, network interface - data rate: how fast can data be transferred? - keyboard: 100 bytes/sec - disk: 60-120 MB/s - network: 1 Mb/s 100 Gb/s - accessing mode: - character device: no buffering is performed. E.g., keyboard - block device: accessed through a cache, be random access. E.g., disk #### I/O Controller #### **■ Control/Status Registers** - CPU tells device what to do -- write to control register - CPU checks whether task is done -- read status register #### Data Registers CPU transfers data to/from device #### Device electronics - performs actual operation - -pixels to screen, bits to/from disk, characters from keyboard # LED, An ARM Example - 开发板上面的led灯通常接到处理 器的GPIO(general purpose input output) - 通过原理图确定需要控制的IO端口 为S3C2440的GPF4、GPF5、GPF6 - 要想点亮LED,需要控制相关的 GPIO口输出高电平(1)或低电平(0) | PLED 1 | nled i | M1'/ | |----------|---------|-------| | ILED 1 | nLED 2 | L14 | | nLED 2 | nLED 4 | L15 | | \ nled 4 | al ED 0 | T 1.6 | EINT4/GPF4 EINT5/GPF5 EINT6/GPF6 <**□** > < **□** > < **□** > # LED, An ARM Example - 通过S3C2440的DataSheet 可以得知,GPIO主要有三个寄存 器需要设置 - GPFCON是控制寄存器,主要是控制GPIO的功能,主要有输入、输出 和中断三个功能。每个GPIO口有寄存器的两位来控制,00:输入,01 : 輸出, 10: 中断, 10: 保留。 - GPFDAT是数据寄存器,主要是控制GPIO输出高电平还是低电平,0: 低电平, 1: 高电平。 - GPFUP 是设置内部上拉电阻的寄存器,0: 不设置上拉电阻,1: 设置 上拉电阻。 | Register | Address | R/W | Description | Reset Value | |----------|------------|-------|-------------------------------------|-------------| | GPFCON | 0x56000050 | R/W | Configures the pins of port F | 0x0 | | GPFDAT | 0x56000054 | R/W | The data register for port F | Undef. | | GPFUP | 0x56000058 | N R/W | Pull-up disable register for port F | 0x000 | | Reserved | 0x5600005c | \ - | - | _ | # LED, An ARM Example ``` .text .global start start: R0,=0x56000050 /*R0 \leftarrow \&GPFCON */ I DR MOV R1,#0x00000400 /* R1\leftarrow 0x00000400 */ STR R1.[R0] /* 0x00000400→GPFCON, GPFCON的 [11:10]设置为了01, 即是GPF5位输出功能*/ LDR R0,=0x56000054 /*R0 \leftarrow \&GPFDAT */ MOV R1,#0x000000000 /* R1\leftarrow 0x000000000 */ R1,[R0] /* 0x000000000→GPFDAT , GPFDAT输出低电平*/ MAIN LOOP: MAIN LOOP R ``` Question: ARM 是怎样访问GPIO相关寄存器的? ◆御≯ ◆恵≯ ◆恵≯ ・恵 #### Some Basic Characteristics of I/O - All I/O activity is controlled by instructions in the computer's ISA. Does the ISA need special instructions for dealing with I/O? - Memory-mapped vs. special instructions - Does the I/O device execute at the same speed as the computer, and if not, what manages the difference in speeds? - Asynchronous vs. synchronous - Is the transfer of information between the computer and the I/O device initiated by a program executing in the computer, or is it initiated by the I/O device? - CPU (polling) vs. device (interrupts) #### Memory-Mapped I/O vs. Special I/O Instructions - An instruction that interacts with an input or output device register must identify the particular input or output device register with which it is interacting. - Special I/O Instructions - These instructions typically allow data to be sent to an I/O device or read from an I/O device. - Coding: designate opcode(s) for I/O, register and operation encoded in instruction 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 IO Device Op - Memory-mapped I/O - assign a memory address to each device register - The advantage to this method is that every instruction which can access memory can be used to manipulate an I/O device. In LC3, we can use data movement instructions (LD/ST) for control and data transfer Memory Keyboard Status Reg 0x FF00 Address space mapped to I/O device registers #### LC-3 #### ■ Memory-mapped I/O (Table A.1) | Location | I/O Register | Function | | |-------------------------------------|----------------------------|----------------------------------------------------------------------|--| | xFE00 | Keyboard Status Reg (KBSR) | Bit [15] is one when keyboard has received a new character. | | | xFE02 Keyboard Data Reg (KBDR) | | Bits [7:0] contain the last character typed on keyboard. | | | xFE04 Display Status Register (DSR) | | Bit [15] is one when device ready to display another char on screen. | | | xFE06 Display Data Register (DDR) | | Character written to bits [7:0] will be displayed on screen. | | # **Asynchronous vs. Synchronous** - I/O events generally happen much slower than CPU cycles. - If: CPU 300MHz, 10clocks/character, 6characters/word - Then: typing speed $(300x10^6)/(10x6)=5x10^6$ words/s #### **■** Synchronous data supplied at a fixed, predictable rate, and CPU reads/writes every X cycles. (Limitation?) #### **■** Asynchronous - I/O devices usually operate at speeds very different from that of a microprocessor. - To control processing in an asynchronous world requires some protocol or handshaking mechanism. - In the case of the keyboard, use a one-bit status register, called a flag, to indicate if someone has or has not typed a character. - These flags are the simplest form of synchronization(?). # Interrupt-Driven vs. Polling - Who determines when the next data transfer occurs? - CPU vs. I/O device - Polling is explicitly looking/examining - CPU keeps checking status register until new data arrives OR device ready for next data - "Are you there yet? Are you there yet? Are you there yet?" - Interrupts is a nudge, knock on the door, loud noise, which forces you to pay attention - Device sends a special signal to CPU when new data arrives OR device ready for next data - CPU can be performing other tasks instead of polling device. - "Wake me when you get there." ## **Example: Input from Keyboard** | Location | I/O Register | Function | |----------|----------------------------|-------------------------------------------------------------| | xFE00 | Keyboard Status Reg (KBSR) | Bit [15] is one when keyboard has received a new character. | | xFE02 | Keyboard Data Reg (KBDR) | Bits [7:0] contain the last character typed on keyboard. | - When a character is typed: - its ASCII code is placed in bits [7:0] of KBDR (bits [15:8] are always zero) - the "ready bit" (KBSR[15]) is set to one (who sets?) - keyboard is disabled -- any typed characters will be ignored - When KBDR is read: - KBSR[15] is set to zero (who sets?), meaning no keyboard key is pending - pending keyboard is enabled with the keyboard # **Memory-mapped Operations** ■ How do we read ready bit? LDI RO, KBSR ■ How do we test whether the bit is one? Negative, so BRn, or BRzp ■ How do we read keyboard data? LDI RO, KBDR #### **Basic Input Routine** ``` POLL LDI R0, KBSR BRzp POLL LDI R0, KBDR ... KBSR .FILL **FE00 KBDR .FILL **FE02 ``` #### Implementation of Memory-mapped I/O - MIO.EN indicates whether a data movement from/to memory or I/O is to take place this clock cycle. - MAR contains the address of the memory location or the memory-mapped address of an I/O device register. - · R.W indicates whether a load or a store is to take place. # D A EA OP EX **LDI** (Indirect) 16 ←□ → ←□ → ← □ → # D A EA OP EX **LDI** (Indirect) LD.PC-**PCMUX** LD.MDR — MDR MAR <--- LD.MAR 16 **MEMORY** MIO.EN-> MUX - MEM.EN.R.W ◆□▶ ◆圖▶ ◆臺▶ ◆臺▶ # → OP D → EA **LDI** (Indirect) LD.MDR — MDR MAR <--- LD.MAR 16 **MEMORY** MIO.EN-> MUX - MEM.EN.R.W イロト イ御ト イミト イミト # $\mathsf{D} \longrightarrow \mathsf{EA} \longrightarrow \mathsf{OP}$ LDI (Indirect) GateMDR — ▷△ 16 LD.MDR — MDR <>─ LD.MAR **MEMORY** イロト イ御ト イミト イミト ## Memory-mapped I/O: LDI RO, KBSR # Memory-mapped I/O: LDI RO, KBDR #### **Example: Output to Screen** - When Display device is ready to display another character: - the "ready bit" (DSR[15]) is set to one, indicating that processor can write a character for display - When data is written to the Display data register: - DSR[15] is automatically set to 0 - character in DDR[7:0] is displayed - any other character data written to DDR is ignored (while DSR[15] is zero) | Location | I/O Register | Function | |----------|-------------------------------|----------------------------------------------------------------------| | xFE04 | Display Status Register (DSR) | Bit [15] is one when device ready to display another char on screen. | | xFE06 | Display Data Register (DDR) | Character written to bits [7:0] will be displayed on screen. | #### **Basic Output Routine** | POLL | LDI | R1, | DSR | |------|-------|-------|----------| | | BRzp | POL | <b>L</b> | | | STI | RO, | DDR | | | | | | | | • • • | | | | DSR | .FIL | L xFl | E04 | | DDR | .FIL | L xFl | E06 | | I | | | | the "ready bit" (DSR[15]) is set to one, indicating that processor can write a character for display # D A EA OP EX **STI** (Indirect) 16 ←□ → ←□ → ← □ → # D A EA OP EX **STI** (Indirect) LD.PC-**PCMUX** LD.MDR — MDR MAR <--- LD.MAR 16 **MEMORY** MIO.EN-> MUX - MEM.EN.R.W ◆□▶ ◆圖▶ ◆臺▶ ◆臺▶ # D A EA OP **STI** (Indirect) LD.IR-16 GateMDR — ▷♠ LD.MDR — MDR ←□ → ←□ → ← □ → # D - EA - OP - EX **STI** (Indirect) LD.MDR — MDR MAR <--- LD.MAR 16 **MEMORY** MIO.EN-> MUX - MEM.EN.R.W ◆□▶ ◆圖▶ ◆臺▶ ◆臺▶ # $\mathsf{D} \longrightarrow \mathsf{EA} \longrightarrow \mathsf{OP}$ **STI** (Indirect) GateMDR — ▷△ 16 LD.MDR — MDR イロト イ御ト イミト イミト ## **Keyboard Echo Routine** - Usually, input character is also printed to screen. - User gets feedback on character typed and knows its OK to type the next character. | POLL1 | LDI RO, KBSR | |-------|--------------| | | BRzp POLL1 | | | LDI RO, KBDR | | POLL2 | LDI R1, DSR | | | BRzp POLL2 | | | STI RO, DDR | | | | | | ••• | | KBSR | .FILL xFE00 | | KBDR | .FILL xFE02 | | DSR | .FILL xFE04 | | DDR | .FILL xFE06 | | | | ## 计算系统概论A Introduction to Computing Systems (CS1002A.02) 计算机科学与技术学院 School of Computer Science and Technology ## **Outline** - 1 Review - **2** TRAP Routines - **3 Operating System Service Routines** #### **Today: CONTROL in Von Neumann Model** ## **Control Instructions for System Calls** 2025/2/24 #### **Today: CONTROL in LC-3 Data Path** ## **Outline** - 1 Review - 2 TRAP Routines - **3 Operating System Service Routines** #### System Calls - Some operations require specialized knowledge and protection - Understanding IO device registers and how to use them poses a challenge to most application programmers. - I/O registers are shared by many programs, and in general it is illadvised to give user programmers access to these registers. - Solution: service routines or system calls - Low-level, privileged operations performed by operating system - 1. User program invokes system call - 2. Operating system code: Saves registers Performs operation Restores registers 3. Returns control to user program #### LC-3 TRAP Mechanism #### Provides a set of service routines - Part of operating system -- routines start at arbitrary addresses - Up to 256 routines #### ■ Requires a table of starting addresses - Stored in memory (x0000 through x00FF) - Used to associate code with trap number - Called Trap Vector Table (or System Control Block) #### ■ Uses the TRAP instruction When a user program wishes to have the operating system execute a specific service routine on behalf of the user program, and then return control to the user program #### Uses a linkage mechanism - For returning control to the user program - Execution resumes immediately after the TRAP instruction - Using the RTI instruction #### LC-3 TRAP Routines(service routines) The LC-3 Trap Vector Table #### • GETC (TRAP x20) - Read a single character from keyboard. - -The character is not echoed to the console. - -ASCII copied into R0 and R0[15:8] is cleared. - OUT (TRAP x21) - -Write R0[7:0] to the console display. - PUTS (TRAP x22) - Write a string of ASCII characters to the console display. - String address in R0. - -Writing terminates with the occurrence of x0000 - IN (TRAP x23) - Print a prompt on the screen and read a single character from the keyboard. - Character is echoed to the console. . - ASCII copied into R0 and R0[15:8] is cleared. - HALT (TRAP x25) - Halt execution and print a message on the console. #### **TRAP Instruction** #### #### Trap vector - Identifies which service routine the user program wants the operating system to execute on its behalf - 8-bit trap vector zero-extended to form 16-bit address, serves as index into table of service routine addresses #### ■ Where to go Lookup starting address from table #### ■ Enabling return Push the address of next instruction after TRAP together with the PSR to the system stack (Poped by the RTI instruction) | • • | |-------| | x0400 | | x0430 | | x0450 | | x04A0 | | x04E0 | | x0520 | | : | | | | | 0000 The LC-3 Trap Vector Table ←□ → ←□ → ←□ → ←□ → □ □ #### **TRAP Mechanism Operation** #### **TRAP Mechanism Operation** ``` OLD PSR=PSR: if (PSR[15] == 1) { // called by a user program Saved USP ← R6: R6 ← Saved SSP; PSR[15]=0; // switch to supervisor mode Push OLD_PSR, PC+1 on the system stack PC = mem[ZEXT(trapvect8)]; // set PC ``` #### **RTI** instruction #### RTI instruction – Return from Trap or Interrupt #### **RTI** instruction ``` if (PSR[15] == 1) // user mode //RTI cannot called by user program Initiate a privilege mode exception: else PC = mem[R6]; // R6 is the SSP, PC is restored R6 = R6+1; OLD_PSR = mem[R6]; // OLD_PSR is restored R6 = R6+1; //system stack completes POP before saved PSR //is restored PSR = OLD PSR; //PSR is restored if (PSR[15] == 1) // supervisor mode → user mode saved SSP=R6: R6=Saved USP; ``` ## An example | | .ORIG x3000 | | |--------------|--------------------------|-------------------------------| | | LD R2,TERM | ; Load -7 | | | LD R3,ASCII | ; Load ASCII difference | | <b>AGAIN</b> | TRAP x23 | ; Request keyboard input | | | ADD R1,R2,R0 | ; Test for terminating | | | BRz EXIT | ; character | | | ADD R0,R0,R3 | ; Change to lowercase | | | TRAP x21 | ; Output to the monitor | | | BRnzp AGAIN | ; and do it again! | | TERM | .FILL xFFC9 | ; FFC9 is negative of ASCII 7 | | ASCII | .FILL <mark>x0020</mark> | | | EXIT | TRAP x25 | ; Halt | | 2025/2/24 | .END | 4E | | 2025/2/24 | | | ## **Outline** - 1 Review - **2 TRAP Routines** - **3 Operating System Service Routines** ## Character Input Service Routine (IN, TRAP x23) ``` 01 : Service Routine for Keyboard Input 02; 03 ORIG ×04A0 04 START ST R1, SaveR1; Save the values in the registers 05 ST R2, SaveR2 ; that are used so that they 06 ST R3,SaveR3 : can be restored before RTI 07; 80 LD R2, Newline; Newline: ASCII code for newline 09 L1 LDI R3,DSR : Check DDR -- is it free? 0A BRzp L1 0B STI R2,DDR : Move cursor to new clean line OC ; 0D LEA R1, Prompt; Prompt is starting address 0E ; of prompt string 1F Loop LDR R0,R1,#0 ; Get next prompt character 10 BRz Input ; Check for end of prompt string 11 L2 LDI R3,DSR : Check DDR -- is it free? 12 BRzp L2 13 STI R0,DDR : Write next char 14 ; prompt string 15 ADD R1,R1,#1; Increment prompt pointer 16 BRnzp Loop 17 ; ``` ## Character Input Service Routine (IN, TRAP x23) ``` Input LDI R3, KBSR; Has a character been typed? 19 BRzp Input 1A LDI RO, KBDR ; Load it into RO 1B L3 LDI R3,DSR 1C BRzp L3 1D STI RO,DDR ; Echo input character 1E : to the monitor 1F 20 L4 LDI R3,DSR 21 BRzp L4 22 STI R2,DDR ; Move cursor to new clean line 23 LD R1, SaveR1 ; Service routine done, restore 24 LD R2, SaveR2; original values in registers. 25 LD R3, SaveR3 26 RTI : Return from Trap 27 ; 28 SaveR1 .BLKW 1 29 SaveR2 .BLKW 1 2A SaveR3 .BLKW 1 2B DSR .FILL xFE04 2C DDR .FILL xFE06 2D KBSR .FILL xFE00 2E KBDR FILL xFE02 2F Newline .FILL x000A ; ASCII code for newline 30 Prompt .STRINGZ "Input a character>" 31<sub>2025/2/24</sub> . END ``` #### A String Output Service Routine (OUT,TRAP x21) ``` 01 .ORIG x0420 ; System call starting address 02 ST R1, SaveR1; R1 will be used to poll the DSR 03 : hardware 04 ; Write the character 05 TryWrite LDI R1, DSR; Get status 06 BRzp TryWrite; Bit 15 on says display is ready 07 WriteIt STI RO, DDR; Write character 80 09; return from trap OA Return LD R1, SaveR1; Restore registers RTI ; Return from trap 0B OC DSR .FILL xFE04 ; Address of display status register OD DDR .FILL xFE06 ; Address of display data register OE SaveR1 .BLKW 1 ``` OF .END #### A String Output Service Routine (PUTS,TRAP x22) ``` 05 .ORIG x0460 06 ST RO, SaveRO; Save registers that 07 ST R1, SaveR1; are needed by this 80 ST R3, SaveR3; trap service routine 0A ; Loop through each character in the array OC Loop LDR R1, R0, #0; Retrieve the character(s) 0D BRz Return ; If it is 0, done OE L2 LDI R3,DSR 0F BRzp L2 10 STI R1, DDR; Write the character 11 ADD RO, RO, #1 ; Increment pointer 12 BRnzp Loop ; Do it all over again 13 ; 14 ; Return from the request for service call 15 Return LD R3, SaveR3 16 LD R1, SaveR1 17 LD R0, SaveR0 18 RTI 19: 1A : Register locations 1B DSR .FILL xFE04 1C DDR .FILL xFE06 1D SaveR0 .FILL x0000 1E SaveR1 .FILL x0000 1F SaveR3 .FILL x0000 20<sub>2025/2/24</sub> . END ``` #### Halt the machine(HALT,TRAP x25) ``` 01 .ORIG x0520 : Where this routine resides 02 ST R1, SaveR1; R1: a temp for MC register 03 ST R0, SaveR0 ; R0 is used as working space 04 05 ; print message that machine is halting 06 07 LD RO, ASCIINewLine ;out 80 TRAP x21 09 LEA RO, Message ; puts TRAP x22 0A 0B LD R0, ASCIINewLine ;out 0C TRAP x21 OD ; OE : clear bit 15 at xFFFE (Master Control Register, MCR ) to stop the machine. OF : 10 LDI R1, MCR; Load MC register into R1 11 LD R0, MASK; R0 = x7FFF 12 AND RO, R1, RO; Mask to clear the top bit 13 STI RO, MCR; Store RO into MC register ``` #### Halt the machine(HALT,TRAP x25) ``` 14: return from HALT routine. 16 : (how can this routine return if the machine is halted above?) 17 ; 18 LD R1, SaveR1; Restore registers 19 LD R0, SaveR0 1A RTI 1B ; 1C : Some constants 1D 1E ASCIINewLine FILL x000A 1F SaveR0 .BLKW 1 20 SaveR1 .BLKW 1 21 Message .STRINGZ "Halting the machine." 22 MCR .FILL xFFFE ; Address of MCR 23 MASK .FILL x7FFF; Mask to clear the top bit 24 . END ``` Figure 4.5 The clock circuit and its control. ## Using subroutines to implement (IN, TRAP x23) ``` 01 .ORIG x04A0 02 START JSR SaveReg 03 LD R2, Newline 04 JSR WriteChar 05 LEA R1, PROMPT 06; 07 80 Loop LDR R2,R1,#0 ; Get next prompt char 09 BRz Input 0A JSR WriteChar 0B ADD R1,R1,#1 0C BRnzp Loop 0D ; 1A WriteChar LDI R3,DSR 1B BRzp WriteChar 1C STI R2, DDR 1D RET ; JMP R7 terminates subroutine 1E DSR .FILL xFE04 1F DDR .FILL xFE06 20 : ``` ## **Using Traps to implement (IN, TRAP x23)** | ▶ x0214 | x0000 | 0 | TOUT_R1 .BLKW #1 | |-----------------|-------------|------------------|----------------------| | ▶ x0215 | x000A | 10 | TIN_R7 .BLKW #1 | | ▶ x0216 | x0033 | 51 | OS_R0 .BLKW #1 | | ▶ x0217 | x0000 | 0 | OS_R1 .BLKW #1 | | ▶ x0218 | x0000 | 0 | OS_R2 .BLKW #1 | | ▶ x0219 | x0000 | 0 | OS_R3 .BLKW #1 | | ▶ x021A | X000A | 10 | OS_R7 .BLKW #1 | | ▶ x0232 | x3FE2 | 16354 | TRAP_IN ST R7,TIN_R7 | | ► x0233 | xE03A | -8134 | LEA R0,TRAP_IN_MSG | | ► x0234 | xF022 | -4062 | PUTS | | ▶ <b>x</b> 0235 | xF020 | -4064 | GETC | | ▶ <b>x</b> 0236 | xF021 | -4063 | OUT | | ▶ x0237 | x31DE | 12766 | ST R0,OS_R0 | | ▶ x0238 | x5020 | 20512 | AND R0,R0,#0 | | ▶ x0239 | x102A | 4138 | ADD R0,R0,#10 | | ▶ x023A | xF021 | -4063 | OUT | | ▶ x023B | x21DA | 8666 | LD R0,OS_R0 | | ▶ x023C | x2FD8 | 12248 | LD R7,TIN_R7 | | ▶ x023D | x8000 | -32768 | RTI | | | MDAD IN MCC | CERTIFICE II - T | must a abancatan' " | 2025/2/24 ## **Data Type Conversion** #### **I**/0 - Keyboard input routines read ASCII characters (not binary values) - Console output routines write ASCII ('s' not "x73") #### **■** Consider this program: ``` TRAP x23 ; input from keyboard ADD R1, R0, #0 ; move to R1 TRAP x23 ; input from keyboard ADD R0, R1, R0 ; add two inputs TRAP x21 ; display result TRAP x25 ; HALT ``` - User inputs '2' and '3' -- what happens? - Result displayed: 'e' - Why? ``` ASCII '2' (x32) + ASCII '3' (x33) = ASCII 'e' (x65) ``` ## **ASCII to Binary** #### ■ Single digit numbers are trivial (subtract x30) $\bullet$ E.g., '7' is ASCII x37, x37 - x30 = x7 | x32 | '2' | |-----|-----| | v35 | '5' | - Input - Assume we've read three ASCII digits (e.g., "259") into a memory buffer '9' x39 x0 #### ■ How do we convert this to a *number* we can use? - Convert first character to digit (subtract x30) and multiply by 100 - Convert second character to digit and multiply by 10 - Convert third character to digit - Add the three digits together ## **ASCII to Binary Conversion Algorithm** R0 pointer in string buffer R2 character from memory R5 current value ## Multiplication ## ■ How can we multiply a number by 100? - Approach 0 - Use the MUL instruction - Approach 1 - Add <number> to itself 10 times - Approach 2 - Add 10 to itself <number> times (better if number < 10)</li> - Approach 3 - Look it up! Only practical if number of multiplicands is small | 111100 111 | | | |--------------|-----|------| | liber by | #0 | 0x10 | | Lookup table | #10 | 1x10 | | in memory | #20 | 2x10 | | | #30 | 3x10 | | es | #40 | 4x10 | | | #50 | 5x10 | ## **Code for Lookup Table** ``` ; multiply R0 by 100, using lookup table LEA R1, Lookup100 ; R1 = table base ADD R1, R1, R0 ; add index (R0) LDR R0, R1, #0 ; load from M[R1] Lookup100 .FILL #0 ; entry 0 .FILL #100 ; entry 1 .FILL #200 ; entry 2 .FILL #300 ; entry 3 .FILL #400 ; entry 4 .FILL #500 ; entry 5 .FILL #600 ; entry 6 .FILL #700 ; entry 7 .FILL #800 ; entry 8 .FILL #900 ; entry 9 ``` 29 ## **Complete ASCII to Binary Conversion Code** ■ ASCII "259" to value 259 3 R1 ## Complete ASCII to Binary Conversion Code (1 of 3) ``` ; Three-digit buffer at ASCIIBUF. ; R1 tells how many digits to convert. ; Put resulting decimal number in RO. ASCIItoBinary AND R0, R0, #0 ;clear result ADD R1, R1, #0 ;test # digits BRz DoneAtoB ; done if no digits R3, NegZero ; R3 = -x30 LD LEA R2, ASCIIBUF ;ptr to the first digit ADD R2, R2, R1; R2 = (R2) + (R1) ADD R2, R2, #-1 ;points to ones digit LDR R4, R2, #0 ;load digit R4, R4, R3 ; convert to number ADD ADD R0, R0, R4 ; add ones contribution ``` # **Complete ASCII to Binary Conversion Code(2 of 3)** ``` ADD R1, R1, #-1 ; one less digit DoneAtoB :done if zero BRz ADD R2, R2, #-1 ;points to tens digit LDR R4, R2, #0 ;load 'tens' digit R4, R4, R3 ; convert to number ADD LEA R5, Lookup10 ;multiply by 10 ADD R5, R5, R4 LDR R4, R5, #0 ADD R0, R0, R4 ;adds tens contribution to total ADD R1, R1, #-1 ; one less digit BRz DoneAtoB :done if zero ADD R2, R2, #-1 ;points to hundreds digit LDR R4, R2, #0 ;load digit ADD R4, R4, R3 convert to number R5, Lookup100 LEA ;multiply by 100 ADD R5, R5, R4 R4, R5, #0 LDR ADD R0, R0, R4 ;adds 100's contrib 2025/2/24 ``` # **Complete ASCII to Binary Conversion Code(3 of 3)** ``` DoneAtoB RET NegZero .FILL xFFD0 ;-x30 ASCIIBUF .BLKW 4 Lookup10 .FILL #0 .FILL #10 .FILL #20 ... Lookup100 .FILL #0 .FILL #0 .FILL #100 .FILL #200 ... ``` 33 ## Binary to ASCII Conversion - Converting a 2's complement binary value to a three-digit decimal number - Resulting characters can be output using OUT - Instead of multiplying, we need to divide by 100 to get hundreds digit. - Why wouldn't we use a lookup table for this problem? - Subtract 100 repeatedly from number to divide. - **■** First, check whether number is negative. - Write sign character (+ or -) to buffer and make positive. ## **Binary to ASCII Conversion Code (1 of 3)** ``` R0 is between -999 and +999. ; Put sign character in ASCIIBUF, followed by three ; ASCII digit characters. BinaryToASCII LEA R1, ASCIIBUF ;ptr to result string ADD R0, R0, #0 ;test sign of value BRn NegSign LD R2, ASCIIplus ;store '+' STR R2, R1, #0 BR Begin100 LD R2, ASCIIneg ;store '-' NegSign STR R2, R1, #0 NOT RO, RO convert value to pos ADD R0, R0, #1 Begin100 LD R2, ASCIIoffset LD R3, Neg100 Loop100 ADD R0, R0, R3 BRn End100 ADD R2, R2, #1 ; add one to digit BR Loop100 ``` ## **Binary to ASCII Conversion Code(2 of 3)** ``` End100 STR R2, R1, #1 ;store ASCII 100's digit LD R3, Pos100 ADD R0, R0, R3 ; restore last subtract LD R2, ASCIIoffset LD R3, Neg10 ADD RO, RO, R3 Loop10 BRn End10 ADD R2, R2, #1 ;add one to digit BR Loop10 End10 STR R2, R1, #2 ;store ASCII 10's digit ADD R0, R0, #10 ; restore last subtract LD R2, ASCIIoffset ADD R2, R2, R0 ;convert one's digit STR R2, R1, #3 ;store one's digit RET ``` ## **Binary to ASCII Conversion Code(3 of 3)** ``` ASCIIplus .FILL x002B ;plus sign ASCII code ASCIIneg .FILL x002D ;neg sign ASCII code ASCIIoffset .FILL x0030 ;zero's ASCII code Neg100 :-100 .FILL xFF9C Pos100 :100 .FILL x0064 Neg10 .FILL xFFF6 ;-10 ``` ## 计算系统概论A Introduction to Computing Systems ( CS1002A.02 ) #### 计算机科学与技术学院 School of Computer Science and Technology ## **Outline** - 1 Review - 2 Interrupt-Driven I/O - 3 Input/Output ## What is Interrupt-Driven I/O? ``` Program A is executing instruction n Program A is executing instruction n+1 Program A is executing instruction n+2 Program A is executing instruction n+3 Program A is executing instruction n+4 ..... ``` ## What is Interrupt-Driven I/O? ``` Program A is executing instruction n Program A is executing instruction n+1 Program A is executing instruction n+2 Interrupt!!! Program A is executing instruction n+3 Program A is executing instruction n+4 ..... ``` ## What is Interrupt-Driven I/O? ``` Program A is executing instruction n Program A is executing instruction n+1 Program A is executing instruction n+2 1: Interrupt signal is detected 1: Program A is put into suspended animation 1: PC is loaded with the starting address of Program B 2: Program B starts satisfying I/O device's needs 2: Program B continues satisfying I/O device's needs 2: Program B continues satisfying I/O device's needs 2: Program B finishes satisfying I/O device's needs 3: Program A is brought back to life Program A is executing instruction n+3 Program A is executing instruction n+4 . . . . . . . . . . . . ``` ## Why Have Interrupt-Driven I/O? - Polling requires the processor to waste a lot of time spinning its wheels, re-executing again and again the LDI and BR instructions until the ready bit is set. - With interrupt-driven I/O, none of that testing and branching has to go on. Interrupt-driven I/O allows the processor to spend its time doing what is hopefully useful work, executing some other program perhaps, until it is notified that some I/O device needs attention. ## Why Have Interrupt-Driven I/O? An Example - Suppose we are asked to write a program that takes a sequence of 100 characters typed on a keyboard and processes the information contained in those 100 characters. We need to perform this process on 1000 consecutive sequences. - Assume the characters are typed at the rate of 80 words/minute, which corresponds to one character every 0.125 seconds. So, It would take 100 · 0.125 = 12.5 seconds to get a 100-character sequence. - Assume the processing of the 100-character sequence takes 12.49999 seconds. #### ■ Polling: time for one sequence: 12.5 + 12.49999 = 24.99999 seconds. #### ■ Interrupt-driven I/O: - Assume 0.0000001 seconds for each character typed, or 0.00001 seconds for the entire 100-character sequence. - time for one sequence: 0.00001 + 12.49999 = 12.5 seconds. #### ■ For 1000 sequences: Polling vs. Interrupt-driven I/O ≈ 6.94h vs. 3.47h ## Two Parts to interrupt-driven I/O - Part1. the mechanism that enables an I/O device to interrupt the processor - Part2. the mechanism that handles the interrupt request. - Several things must be true for an I/O device to actually interrupt the program that is running: - C1. The I/O device must want service. - C2. The device must have the right to request the service. - C3. The device request must be more urgent than what the processor is currently doing. - If all three elements are present, the processor stops executing the program that is running and takes care of the interrupt. - C1. The I/O device must want service. - Keyboard: someone has typed a character Monitor: have successfully completed the display of the last character - C2. The device must have the right to request the service: the interrupt enable bit - the interrupt enable bit, which can be set or cleared by the processor (usually by the operating system). - In most I/O devices, this interrupt enable (IE) bit is part of the device status register. #### ■ C3. The Urgency of the Request - To interrupt the running program, the device must have a higher priority than the program that is currently running. - There may be many devices that want to interrupt the processor at a specific time. To succeed, the device must have a higher priority level than all other demands for use of the processor. - The INT signal: To stop the processor from continuing execution of its currently running program and service an interrupt request, the INT signal must be asserted. - The interrupt request signals are input to a priority encoder, a combinational logic structure that selects the highest priority request from all those asserted. Ploteure #### ■ The test for INT - Instead of always going from the last state of one instruction cycle to the first state of the FETCH phase of the next instruction, the next state depends on the INT signal. - If not asserted, continues with next instruction • If INT is asserted, then the next state is the first state of Part II, handling the interrupt request. #### **■** Three stages #### 1. Initiate the interrupt - 1: Interrupt signal is detected - 1: Program A is put into suspended animation - 1: PC is loaded with the starting address of Program B #### 2. Service the interrupt - 2: Program B starts satisfying I/O device's needs - 2: Program B continues satisfying I/O device's needs - 2: Program B continues satisfying I/O device's needs - 2: Program B finishes satisfying I/O device's needs #### 3. Return from the interrupt 3: Program A is brought back to life #### Stage1: Initiate the interrupt - (1) save the state of the interrupted program so it can pick up where it left off after the requirements of the interrupt have been completed - The state includes the contents of the memory locations that are part of the program and the contents of all the general purpose registers. It also includes the PC and PSR. - Assume that the service routine will always save the contents of any general purpose register that it needs before using it. The only state information the LC-3 saves are the PC and PSR. - The LC-3 saves this state information on the supervisor stack in the same way the PC and PSR are saved when a TRAP instruction is executed. #### Stage1: Initiate the interrupt - (2) load the state of the higher priority interrupting program so it can start satisfying its request. - Interrupt service routines are similar to the trap service routines. They are program fragments stored in system space. - Most processors use the mechanism of vectored interrupts. The I/O device transmits to the processor an eight-bit interrupt vector (INTV) along with its interrupt request signal and its priority level. - The Interrupt Vector Table consists of memory locations x0100 to x01FF, each containing the starting address of an interrupt service routine (ISR). - Trap vector table: x0000 to x00FF - Interrupt vector table: x0100 to x01FF - CPU - The PSR is loaded as follows: - Since no instructions in the service routine have yet executed, PSR[2:0] contains no meaningful information. We arbitrarily load it initially with 010. - · Since the interrupt service routine runs in privileged mode, PSR[15] is set to 0. - · PSR[10:8] is set to the priority level associated with the interrupt request. #### Stage2: Service the Interrupt - The PC contains the starting address of the interrupt service routine - The service routine will execute, and the requirements of the I/O device will be serviced. 2025/2/24 #### ■ Stage3: Return from the Interrupt - The last instruction in every interrupt service routine is RTI, return from trap or interrupt. - RTI pops the PC and the PSR from the supervisor stack and restoring them to their rightful places in the processor. - -If the privilege level of the interrupted program is unprivileged, the stack pointers must be adjusted, that is, the Supervisor Stack Pointer saved, and the User Stack Pointer loaded into R6. - -The PC is restored to the address of the instruction that would have been executed next if the program had not been interrupted. ## An Example (1) - Suppose program A is executing when I/O device B, having a PL higher than that of A, requests service. - During the execution of the service routine for I/O device B, a still more urgent device C requests service ## An Example (2) - 1, the supervisor stack and the PC before program A fetches the instruction at x3006. - 2, R6 is pointing to the current contents of the user stack, which are not shown! - 3, The INT signal (caused by an interrupt from device B) is detected at the end of execution of the instruction in x3006. - 4, R6 $\rightarrow$ Saved\_USP; Save\_SSP $\rightarrow$ R6 - 5, PSR of program A → system stack - 6, PC+1 → system stack - 7, The interrupt vector associated with device B is expanded to 16 bits x01F1, and the contents of x01F1 (x6200) is loaded into the PC. ## An Example (3) - 8, The service routine for device B executes until a higher priority interrupt is detected at the end of execution of the instruction at x6202. - 9, The PSR of the service routine for B, which includes the condition codes produced by the AND instruction at x6202, and the address x6203 (PC $\pm$ 1) are pushed on the stack. - 10, The interrupt vector associated with device C is expanded to 16 bits (x01F2), and the contents of x01F2 (x6300) is loaded into the PC. ## An Example (4) - 11, Assume the interrupt service routine for device C executes to completion, finishing with the RTI instruction in x6315. - 12, The supervisor stack is popped twice, restoring the PC to x6203 and the PSR of the service routine for device B, including the condition codes produced by the AND instruction in x6202. ## An Example (5) - 12, The interrupt service routine for device B resumes execution at x6203 and runs to completion, finishing with the RTI instruction in x6210. - 13, The supervisor stack is popped twice, restoring the PC to x3007 and the PSR of program A, including the condition codes produced by the ADD instruction in x3006. - 14, Finally, since program A is in User mode, the contents of R6 is stored in Saved SSP and R6 is loaded with the contents of Saved USP. - 15, Program A resumes execution with the instruction at x3007. ## Interrupts deal with more than I/O devices. - Any event that has a higher priority and is external to the program that is running can interrupt the computer. - It does so by supplying its INT signal, its INTV vector, and its priority level. - If it is the highest priority event that wishes to interrupt the computer, it does so in the same way that I/O devices do as described above. #### Examples: - timer interrupt interrupts the program that is running in order to note the passage of a unit of time. - The machine check interrupt calls attention to the fact that some part of the computer system is not functioning properly. - ..... ## **Polling Revisited** #### ■ Interrupt Mask When set, a processor can ignore INT signal ## How to implement interrupt mask in LC3? ``` 09 LDI R1, PSR 0A LD R2.INTMASK 0B AND R2,R1,R2; R1=original PSR, R2=PSR with interrupts disabled OC. 0D POLL STI R1.PSR; enable interrupts (if they were enabled to begin) ٥F STI R2,PSR; disable interrupts 0F LDI R3.DSR 10 BRzp POLL: Poll the DSR STI R0,DDR; Store the character into the DDR 11 12 STI R1,PSR; Restore original PSR 1D INTMASK .FILL xBFFF; 1011 1111 1111 1111 .FILL xFFFC 1E PSR 1F DSR .FILL xFE04 20 DDR .FILL xFE06 ``` ## ICS 复习 ## Important parts - Appendix A - Exclude exception - Appendix C - Exclude exception # About chapter 10 - Stack machines, Zero-address machine - PPT 8-2 - Chapter 10.2 - Simulate a calculator with LC-3 Figure 10.11 Flowchart for the OpMult subroutine. Start ## About chapter 10 - Data Type Conversion - PPT 9-2 - Chapter 10.1 ## Appendix C: Microarchitecture of the LC-3 - Time is divided into clock cycles. - The cycle time of a microprocessor is the duration of a clock cycle. - A common cycle time for a microprocessor today is 0.33 nanoseconds, which corresponds to 3 billion clock cycles each second. We say that such a microprocessor is operating at a frequency of 3 gigahertz, or 3 GHz. - We say, "at each instant of time," but we really mean during each clock cycle. Figure C.1 Microarchitecture of the LC-3, major components. ## Appendix C: Microarchitecture of the LC-3 - The control signals needed in the "next" clock cycle depend on the following: - 1. The control signals that are present during the current clock cycle. - 2. The LC-3 instruction that is being executed. - 3. The privilege mode of the program that is executing, and whether the processor has the right to access a particular memory location - 4. If that LC-3 instruction is a BR, whether the conditions for the branch have been met (i.e., the state of the relevant condition codes). - 5. Whether or not an external device is requesting that the processor be interrupted. - 6. If a memory operation is in progress, whether it is completing during this cycle. Figure C.1 Microarchitecture of the LC-3, major components. ## Appendix C: Microarchitecture of the LC-3 - 1. J[5:0], COND[2:0], and IRD—ten bits of control signals provided by the current clock cycle. - 2. IR[15:12], which identifies the opcode, and IR[11:11], which differentiates JSR from JSRR (i.e., the addressing mode for the target of the subroutine call). - **3. PSR[15]**, bit [15] of the Processor Status Register, which indicates whether the current program is executing with supervisor or user privileges, - 4. ACV, a signal that informs the processor that a process operating in User mode is trying to access a location in privileged memory. ACV stands for Access Control Violation. When asserted, it denies the process access to the privileged memory location. - **5. BEN** to indicate whether or not a BR should be taken. - **6. INT** to indicate that some external device of higher priority than the executing process requests service. - 7. R to indicate the end of a memory operation. Figure C.1 Microarchitecture of the LC-3, major components. ### About R #### A flash memory chip #### K9F2G08U0A-PCB0/PIB0 | Pin Name | Pin Function | |-------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | I/O <sub>0</sub> ~ I/O <sub>7</sub> | DATA INPUTS/OUTPUTS The I/O pins are used to input command, address and data, and to output data during read operations. The O pins float to high-z when the Chip is deselected or when the outputs are disabled. | | CLE | COMMAND LATCH ENABLE The CLE injust controls the activating path for commands sent to the command register. When active high, commands are latched into the command register through the I/O ports on the rising edge of the WE signal. | | ALE | ADDRESS LATCH ENABLE The ALE imput controls the activating path for address to the internal address registers. Addresses are lasthed on the rising edge of WE with ALE high. | | Œ | CHIP ENABLE The CE input is the device selection control. When the device is in the Busy state, CE high is ignored, and the device does not return to standby mode in program or erase operation. | | RE | READ ENABLE The RE input is the serial data-out control, and when active drives the data onto the I/O bus. Data is valid HEA after the falling edge of RE which also increments the internal column address counter by one. | | WE | WRITE ENABLE The WE input controls writes to the I/O port, Commands, address and data are latched on the rising edge or the WE place. | | WP | WRITE PROTECT The WP pin provides inadvertent program/erase protection during power transitions. The internal high voltage generator is reset when the WP pin is active low. | | ₽Æ | READY/BUSY OUTPUT The RIB output indicates the status of the device operation. When low, it indicates that a program, erase or random read operation is in process and returns to high state upon completion. It is an open drain output and does not float to high-z condition when the chip is deselected or when outputs are disabled. | | Voc | POWER Voc is the power supply for device. | | Vss | GROUND | | N.C | NO CONNECTION Lead is not internally connected. | #### READY/BUSY OUTPUT The R/B output indicates the status of the device operation. When low, it indicates that a program, erase or random read operation is in process and returns to high state upon completion. # Appendix C: Microarchitecture of the LC-3 - · During each clock cycle, - 42 of these control signals determine the processing of information in the data path - the other 10 control signals combine with the 10 bits of additional information to determine which set of control signals will be required in the next clock cycle. - These 52 control signals specify the state of the control structure of the LC-3 microarchitecture Figure C.1 Microarchitecture of the LC-3, major components. - The state machine describes what happens during each clock cycle in which the computer is running. - Each state is active for exactly one clock cycle before control passes to the next state. - Each node in the state machine corresponds to the activity that the processor carries out during a single clock cycle. - the FETCH phase of the instruction cycle - In state 18, the MAR is loaded with the address contained in PC, and the PC is incremented in reparation for the FETCH of the next LC-3 instruction after the current instruction finishes its instruction cycle. - If the content of MAR specifies privileged memory, and PSR[15] = 1, indicating User mode, the access of the instruction will not be allowed. That would be an access control violation, so ACV is set. - Finally, if there is no interrupt request present (INT = 0), the flow passes to state 33. Or else, the flow passes to state 49. 1711 - · the FETCH phase of the instruction cycle - From state 33, control passes to state 60 if the processor is trying to access privileged memory while in User mode, or to state 28, if the memory access is allowed, that is, if there is no ACV violation. - In state 28, since the MAR contains the address of the instruction to be processed, this instruction is read from memory and loaded into the MDR. Since this memory access can take multiple cycles, this state continues to execute until a ready signal from the memory (R) is asserted, indicating that the memory access has completed. Thus, the MDR contains the valid contents of the memory location specified by MAR. - The state machine then moves on to state 30, where the instruction is loaded into the instruction register (IR), completing the fetch phase of the instruction cycle. 1711 - The state machine then moves to state 32, where DECODE takes place. - there are 16 arcs emanating from state 32, each one corresponding to bits [15:12] of the LC-3 instruction. - the arc from the last state of each instruction cycle (i.e., the state that completes the processing of that LC-3 instruction) takes us to state 18 Floure C.2 A state machine for the 10 # The data path - The data path consists of all components that actually process the information during each clock cycle— - the **functional units** that operate on the information, - the registers that store information at the end of one cycle so it will be available for further use in subsequent cycles, - and the **buses and wires** that carry information from one point to another in the data path. # The data path #### • LD.PC (p134) In order for the PC to be, the finite state machine must assert the PCMUX select lines to choose the output of the box labeled +1 and must also assert the LD.PC signal to load the output of the PCMUX into the PC at the end of the current cycle. #### ALUK ALUK consists of two bits, it can have one of four values. Which value it has during any particular clock cycle depends on whether the ALU is required to ADD, AND, NOT, or simply pass one of its inputs to the output during that clock cycle (PASSA). # Additional logic required to provide control signals Figure C.6 Additional logic required to provide control signals. #### Instruction Cycle (chapter 4.3 & chapter 5.6) ----including 6 phases, each phase requiring 0 or more steps. 2025/2/24 18 # NOT (Register) # Interrupt and RTI ### Interrupt - State 18 is the only state in which the processor checks for interrupts - State 45, switch stack - State 41, write PSR - State 52, write PC - State 53, read new instruction Table'Vector: concatenating Table and Vector Table: 0x00, trap 0x01, interrupt # TRAP and RTI - RTI - State 8, exception detection - State 8, 36, 38, restore PC - State 39, 40, 42, restore PSR - State 59, switch stack