Monday, October 28, 2013

Working with String data in C#

 

In this tutorial we’re going to take a tour of the features offered by string type(System.String),working with them,manipulating them and working with Regular Expressions.The keyword string is similar to System.String,in fact it is just a reference to .NET’s base class System.String.So whether you are using string or System.String ,they mean same.Due to the fact that almost every C# program uses string data somewhere,C# has a simple keyword to represent the base class System.String,which is string.String is just a collection of characters(an array of characters),if you ever had a background in C++ you’d knew how you used the char keyword to declare string data,like

char mystring[ ] = ””;.

Strings are reference types that means they can be set to null,for example :

string csharpstring = null;

and because of being reference types they are stored on the managed heap(A managed heap is similar to heap but it is maintained by Common Language Runtime(CLR) of .NET platform.So that now we know some basics about strings,let’s focus ourselves to other functions of strings.

 

Declaring a string :

A string can be declared like any other type in C#,such as ,

string mystring = “This is a string.”;

using implicitly typed variable,

var mystring = “This is a string.”;

or if you don’t need or don’t know the value,you can leave the string to null,

string mystring = null;

var mystring = null;

or to assign a new string variable or field to a previously declared one,

string previousstring = “This one’s older.”;

string newstring = previousstring;

when you declare a string(or any other) variable or field it simply takes the value of it’s right hand field or variable.So newstring will now hold the string assigned to previousstring.

 

Declaring a string array :

Declaring a string array is very similar to declaring a string.If you know the number of elements any their values at time of declaring,you can use this syntax to declare a string array,

string[ ] mystringarray = {“First String”,”Second String”,”Third String”,”And So on…”};

other way of doing this,

string[ ] mystringarray = new string[ ]{“First String”,”Second String”,”Third String”,”And So on…”};

or if you don’t know the value at time of declaration you can use implicit array declaration,like this,

string[ ] mystring = new string[ ]{ };

using implicitly typed variable,

var mystring = new string[ ]{ };

 

Adding(Concatenating) two or more strings :

To add two or more strings one can make use of + operator.For example :

string first = “CSharp is a ”;

string second = “object oriented ”;

string third = “language".”; 

string sentence= first + second + third;

Can you guess what would be the value of sentence ? It will definitely be CSharp is a object oriented language.This method is useful if you have small strings but is no well suited to add up long strings.For this specific purpose C# has a tailored class StringBuilder.As its name itself says it is useful to build strings.So let’s see how we can put StringBuilder to some good work.To follow along create a new ConsoleApplication project in your preferred IDE,name it “WorkingWithStrings”.If the project is created successfully you’ll see the Program.cs file open in the code editor.Now in the Main  method create an instance of StringBuilder class like this,

StringBuilder stringBuilder = new StringBuilder( );

The constructor has 5 overloads where you can define the following;

  • Capacity of this instance.
  • Maximum capacity of this instance.
  • An initial string with which to initialize this instance.
  • Start index and length of the string from which this instance is to be initialized.

Now that we know how to initialize the class let’s focus on how we can put it to some good work.Okay let’s try create the above sentence with our new tool,to add any string or other value type to a string we make use of Append(—) method.For Example :

stringBuilder.Append(“CSharp is a ”);

stringBuilder.Append(“object oriented ”);

stringBuilder.Append(“language.”);

The above three statements similar to first + second + third in working.Now to get our final string we use ToString(—) method.We can get the final string(sentence) in this way,

string sentence = stringBuilder.ToString();

So the final code looks like this,


using System;

using System.Text;

namespace WorkingWithStrings

{

class Program

{

static void Main(string[] args)

{

// First Method : Using + operator to concatenate strings

string first = "CSharp is a ";

string second = "object oriented ";

string third = "language.";

string sentence = first + second + third;

Console.WriteLine("Using + Operator :");

Console.WriteLine(sentence);

Console.WriteLine();

//

// Second Method : Using StringBuilder class

StringBuilder stringBuilder = new StringBuilder();

stringBuilder.Append("CSharp is a ");

stringBuilder.Append("object oriented ");

stringBuilder.Append("language.");

sentence = stringBuilder.ToString();

Console.WriteLine("Using StringBuilder :");

Console.WriteLine(sentence);

Console.WriteLine();

//

Console.ReadLine();

}

}

}


You can compile the code above directly with no modifications,copy and overwrite your code with this one,build and run the solution.When you run the solution you’ll see there is no difference between outputs of both.

 

Working with file paths :

A path of a file,for example:Notepad,on the disk looks like,

C:\Windows\Notepad.exe

but you cannot declare a path or pattern of this form in C# because C# treats \(Backslash) as an escape character,so you need to escape a backslash with another backslash,so that it looks like,

string path = “C:\\Windows\\Notepad.exe”;

Another way,perhaps more cleaner to do this is using @ symbol.So that it now looks like,

string path = @“C:\Windows\Notepad”;

Both declarations are valid but the second one is suggested.

 

Get a string representation of other types :

The ultimate base class of .NET is System.Object ,all other classes are derived from this one.System.Object has four methods ,

  • Equals(—);
  • GetHashCode(—);
  • GetType(—)
  • ToString(—);

Due to the fact that all derived classes inherit borrow the methods from their parent,every class in C# even those created by you have the above four methods.The fourth one is specially useful because many a times one needs to convert some type to a string.So let’s try to convert some other types to string starting with the most popular one i.e int (System.Int32).To convert an int to a string we can use the ToString(—) method(recall that every class of .NET is derived from System.Object which implements this method and so System.Int32 is no exception),like this,

int number = 2013;

string converted = number.ToString();

After this converted will return 2013 as string.Similarly one can convert a byte,double,float,ulong etc.

But why would someone need to convert something to a string ? The reason is simple,when you need to show some data in a TextBox,RichTextBox or MessageBox,everything goes fine until you’ve string data but may become typical(if you don’t know how to do it)when you’ve some other type of data int,double etc.These controls required that the data to be displayed is of string type and that’s when ToString(—) comes to rescue.To make this more clear try out these examples in a ConsoleApplication project.

// Example 1

Console.WriteLine(DateTime.Now.ToString());// Prints the date and time of the moment this statement gets executed.

// Example 2

double dValue = 10.5;

Console.WriteLine(dValue.ToString());// Prints 10.5

// Example 3

StringBuilder stringBuilder2 = new StringBuilder("A hello from StringBuilder");

Console.WriteLine(stringBuilder2.ToString());

// Example 4

string root = System.IO.Path.GetPathRoot(Environment.SystemDirectory);// Gets the root word of the drive the current OS is installed.

string[] files = System.IO.Directory.GetFiles(root + @"\Windows");// Gets an array of file paths of all files in %Systemroot\Windows.

Console.WriteLine("Total files in " + root + @"\Windows" + " : " + files.Length.ToString());// Prints out total no of files in path declared above.

 

Formatting strings :

Formatting strings is another important task which your application uses if you are really not making an image editor of audio editor.Again,string already has inbuilt functionality for this purpose which is available to us via Format(—) method.The syntax is like this,

string.Format(—);

This method takes two or more input arguments depending upon the number of format items.A format item is represented as {n},where n can be any number.Valid declarations are {0},{100},{95} etc.Format items always start from 0 {0}.A valid declaration of string.Format(—) is,

string.Format(“Your name is {0} and age is {1}.”,”John Doe”,100-75);

When the statement executes every format item in the string gets replaced by its corresponding object.For example : For first format item {0} the first supplied argument is used and so on,so the above statement will output,

Your name is John Doe and age is 25.

Every format item gets replaced by corresponding object

 

Regular Expressions :

A regular expression is sequence of characters and metacharacters in a codified form.Metacharacters can themselves represent a sequence of characters.A regular expression looks something like,

\d+

Regular expressions are invaluable tool for text processing and they can simplify many long tasks such as finding doubled words,matching dates email addresses,user input against some pattern and lots of other things.Regular expressions are so popular and useful that one can consider them as a programming language in themselves.And giving a complete tutorial is out of scope of this post,but if you really want a quick and good start,this is an excellent tutorial.Below is a quick reference of some common patterns,

  • +        One or more.
  • *     Any number of times,greedy matches the maximum available.
  • +?      One or more but as few as possible
  • \d       Matches a single digit(The same applies to \w(whitespace),\s(string))
  • \d+     Matches a sequence of digits but the sequence must have at least one digit(Also applies to \w,\s).
  • [a-z]   Matches a single lowercase alphabet(Same with [A-Z])
  • [a-z]+ Matches sequence of lowercase alphabets but sequence must have at least one alphabet(Same with [A-Z])
  • . (dot) Matches any character except newline in SingleLine mode