banner



How To Declare A Register In Assembly

Arrays, Address Arithmetic, and Strings

CS 301: Assembly Linguistic communication Programming Lecture, Dr. Lawlor

In both C or assembly, you can allocate and access retentiveness in several different sizes:

C/C++ datatype Bits Bytes Register Admission memory Classify memory
char 8 1 al BYTE [ptr] db
brusk 16 two ax WORD [ptr] dw
int 32 4 eax DWORD [ptr] dd
long 64 viii rax QWORD [ptr] dq

For example, we can put total 64-bit numbers into memory using "dq" (Information Quad-word), and then read them back out with QWORD[yourLabel].

We can put private bytes into memory using "db" (Data Byte), and then read them back with BYTE[yourLabel].

C Strings in Assembly

In evidently C, you tin can put a string on the screen with the standard C library "puts" part:

puts("Yo!");      

(Endeavour this in NetRun at present!)

You tin can expand this out a bit, by declaring a string variable.  In C, strings are stored as (abiding) graphic symbol pointers, or "const char *":

const char *theString="Yo!"; puts(theString);      

(Attempt this in NetRun now!)

Internally, the compiler does two things:

  • Allocates retention for the cord, and initializes the memory to  'Y', 'o', '!', and a special zero byte called a nul terminator that marks the end of the string.
  • Points theString to this allocated retention.

In assembly, these are separate steps:

  • Allocate retentivity with thedb(Data Byte) pseudo education, and store characters there, similar    db `Yo!`,0
    • Unlike C++, you can declare a cord using whatever of the three quotes: "doublequotes", 'singlequotes', or `backticks` (backtick is on your keyboard beneath tilde ~)
    • However, newlines like \n Just work inside backticks, an odd peculiarity of the assembler we use (nasm).
  • Notation nosotros manually added ,0 after the string to insert a zip byte to finish the string.
    • If you forget to finish the cord, puts can print corking garbage after the cord until information technology hits a 0.
  • Point at this memory using a jump characterization, only like we were going to jmp to the cord.

Here'south an case:

mov rdi, theString ; rdi points to our string extern puts  ; declare the function call puts    ; call it ret  theString:    ; label, simply similar for jumping 	db `Yo!`,0  ; information bytes for string (don't forget nul!)      

(Try this in NetRun at present!)

In assembly,  there's no syntax difference between:
  • a characterization designed for a leap instruction (a block of lawmaking)
  • a label designed for a phone call instruction (a function catastrophe in ret)
  • a characterization designed as a cord pointer (a nul-terminated string)
  • a characterization designed as a information pointer (allocated with dq)
  • or many other uses--it'due south only a pointer!

We can also change the pointer, to motion down the string.  Since each char is ane byte, moving by 4 bytes moves past 4 chars here, printing "o associates":

mov rdi, theString ; rdi points to our cord
add rdi,iv ; move down the string by 4 chars
extern puts ; declare the function call puts ; phone call it ret theString: ; label, just like for jumping db `Hello assembly`,0 ; data bytes for cord

(Try this in NetRun at present!)

Address Arithmetics

If you lot classify more i constant with dq, they announced at larger addresses.  (Call back that this is backwards from the stack, which pushes each additional item at an ever-smaller accost.)  So this reads the 5, like you'd expect:

dos_equis: 	dq 5   ; writes this constant into a "Information Qword" (8 byte block) 	dq 13  ; writes another constant, at [dos_equis+8] (bytes)   foo: 	mov rax, [dos_equis] ; read retention at this label 	ret

(Try this in NetRun now!)

Adding 8 bytes (the size of a dq, 8-byte / 64-bit QWORD) from the first abiding puts united states of america directly on top of the second constant, 13:

dos_equis: 	dq five   ; writes this abiding into a "Information Qword" (8 byte cake) 	dq 13  ; writes another abiding, at [dos_equis+eight] (bytes)  foo: 	mov rax, [dos_equis+eight] ; read memory at this label, plus viii bytes 	ret

(Try this in NetRun now!)

If y'all add anything between 0 and 8, like adding 1 byte, you will load function of the 5 and part of the 13, resulting in a weirdly split and shifted result.

Accessing an Assortment

An "array" is just a sequence of values stored in ascending gild in retentivity.  If we listed our data with "dq", they show upwardly in memory in that order, so nosotros tin can do pointer arithmetic to pick out the value we want.  This returns vii:

mov rcx,my_arr ; rcx == address of the array
mov rax,QWORD [rcx+1*8] ; load chemical element 1 of array
ret

my_arr:
dq 4 ; assortment element 0, stored at [my_arr]
dq vii ; array element i, stored at [my_arr+8]
dq nine ; array element 2, stored at [my_arr+16]

(Try this in NetRun now!)

Did you ever wonder why the first array element is [0]?  It's because it's zero bytes from the starting time of the pointer!

Go along in mind that each assortment element above is a "dq" or an 8-byte long, so I move downward by 8 bytes during indexing, and I load into the 64-bit "rax".

If the array is of 4-byte integers, we'd

declare them with "dd" (data DWORD), move downward by iv bytes per int array element, and shop the respond in a 32-bit register like "eax".  But the pointer register is always 64 bits!
mov rcx,my_arr ; rcx == address of the assortment
mov eax,DWORD [rcx+i*4] ; load chemical element 1 of array
ret

my_arr:
dd 0xaaabbbcc ; array element 0, stored at [my_arr]
dd 0xc001007 ; array element 1, stored at [my_arr+four]

(Attempt this in NetRun now!)

It'southward extremely like shooting fish in a barrel to take a mismatch between one or the other of these values.  For example, if I declare values with dw (2 byte shorts), but load them into eax (iv bytes), I'll accept loaded two values into ane register.  So this code returns 0xbeefaabb, which is ii 16-bit values combined into one 32-bit annals:
mov rcx,my_arr ; rcx == address of the array
mov eax,[rcx] ; load chemical element 0 of array (OOPS! 32-bit load!)
ret

my_arr:
dw 0xaabb ; array element 0, stored at [my_arr]
dw 0xbeef ; array element 1, stored at [my_arr+2]

(Try this in NetRun now!)

You can reduce the likelihood of this type of error by calculation explicit memory size specifier, similar "WORD" below.  That makes this a compile error ("error: mismatch in operand sizes") instead of returning the incorrect value at runtime.
mov rcx,my_arr ; rcx == accost of the array
mov eax, Word [rcx] ; load element 0 of array (OOPS! 32-chip load!)
ret

my_arr:
dw 0xaabb ; array chemical element 0, stored at [my_arr]
dw 0xbeef ; array element ane, stored at [my_arr+ii]

(Try this in NetRun now!)

(If nosotros really wanted to load a 16-bit value into a 32-bit annals, we could use "movzx" (unsigned) or "movsx" (signed) instead of a plain "mov".)
C++
$.25
Bytes
Assembly Create
Assembly Read
Example
char viii
1
db (data byte)
mov al, BYTE[rcx+i*1]
(Try this in NetRun now!)
short 16
2
dw (data WORD)
mov ax, WORD [rcx+i*two] (Try this in NetRun now!)
int 32
4
dd (data DWORD)
mov eax, DWORD [rcx+i*four] (Attempt this in NetRun now!)
long 64
8
dq (information QWORD)
mov rax, QWORD [rcx+i*8] (Effort this in NetRun now!)
Man C++ Assembly
Declare a long integer. long y; rdx (nothing to declare, just apply a register)
Re-create one long integer to some other. y=x; mov rdx,rax
Declare a arrow to an long. long *p; rax    (nil to declare, use whatsoever 64-chip register)
Dereference (look up) the long. y=*p; mov rdx,QWORD [rax]
Find the address of a long. p=&y; mov rax,place_you_stored_Y
Access an array (piece of cake way) y=p[two]; (sorry, no easy mode exists!)
Access an array (difficult mode) p=p+2;
y=*p;
add together rax,2*8; (move forward by two 8 byte longs)
mov rdx, QWORD [rax] ;  (grab that long)
Access an array (likewise clever) y=*(p+2) mov rdx, QWORD [rax+2*8];  (yeah, that actually works!)

Loading from the wrong identify, or loading the wrong amount of data, is an INCREDIBLY Common trouble when using pointers, in whatsoever linguistic communication.  You Will make this fault at some indicate over the form of the semester, and this results in a crash (rare) or the incorrect data (almost often some strange shifted & spliced integer), and so be careful!

Walking Pointers Downwardly Arrays

There's a classic terse C idiom for iterating through a string, by incrementing a char * to walk downward through the bytes until you lot striking the zero byte at the end:
        while (*p++!=0) { /* do something to *p   */ }

If yous unpack this a scrap, you observe:

  • p points to the first char in the string.
  • *p is the first char in the string.
  • p++ adds 1 to the pointer, moving to the next char in the string.
  • *p++ extracts the start char, and moves the pointer down.
  • *p++!=0  checks if the first char is nix (the cease of the string), and moves the pointer downwards

Hither's a typical case, in C:

char s[]="string";   // declare a string char *p=s;           // point to the start while (*p++!=0) if (*p=='i') *p='a';  // supplant i with a puts(south);      

(Try this in NetRun now!)

Hither'south a similar pointer-walking trick, in assembly:

mov rdi,stringStart again: 	add rdi,1 ; motion pointer downwards the cord 	cmp BYTE[rdi],'a' ; did we hit the letter 'a'? 	jne again  ; if not, keep looking  extern puts call puts ret  stringStart: 	db 'this is a great cord',0      

(Effort this in NetRun at present!)

(We'll see how to declare modifiable strings later.)

How To Declare A Register In Assembly,

Source: https://www.cs.uaf.edu/2017/fall/cs301/lecture/09_15_strings_arrays.html

Posted by: wardoffeir.blogspot.com

0 Response to "How To Declare A Register In Assembly"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel