[The views expressed on this website/blog are mine alone and do not necessarily reflect the views of my employer.]
Wednesday, April 15, 2009
Know your polling station
http://www.cyberabadpolice.gov.in/knowyourpolicestation/pollingstations/
First get your polling station number from http://ceoandhra.nic.in/Final_erolls_2009_II.html then check for polling station number under police station from above site.
Looking for route is a bit tedious task, but the image resolution is good enough. You can zoom in and check for Route number and polling station number
Excellent work done by Cyberabad police.
Tuesday, April 14, 2009
ABB Opens New Automation Products Plant In India
ABB has inaugurated its green field automation products facility near Bangalore, India, to manufacture automation products.
Spread over 18 acres, the factory will manufacture products such as air circuit breakers, switch fuse units, molded-case circuit breakers, low- and medium-voltage drives and systems, high-power rectifiers, static excitation systems and main and auxiliary traction converters, among others.
Commending ABB’s growth in the last six to seven years, Sjoekvist notes that the virtual debt-free status and a healthy order backlog will help ABB tide over the global economic crisis. “Improving energy efficiency and focus on renewable energy were key factors in adding value,” he says
What Went Wrong Ethically in the Economic Collapse
“Shoo. You can’t attend this meeting. We are going to discuss things here that you will tell me I shouldn’t do. I don’t want that on the record and you can’t stop me.”
ABB will enhance productivity at BMW
ABB has signed a frame agreement with BMW Group to deliver 2,100 industrial robots over five years, beginning in 2010
The robots will be applied in parts handling, gluing and spot welding on car-body assembly lines for BMW's 1-series, 3-series, X5-series and Mini models.
Tuesday, April 07, 2009
GMail is 5 year old
On April 2 GMail turned 5. I switched to GMail on 8/30/04. Since then it is serving as my primary web mail ID.
When I started it offered me 1 GB free space and on today it offers 7GB+. I still trying catching up my 1 GB.
GMail made possible easy to archive and label. Also kind of never delete a mail is a reality now.
Search within mail is awesome. Stars, filters, key short cuts. More to that it offers authentication to blogger and many other services. One feature that I feel like having is auto archive the ones that are labeled after configured time span.
Anyway it wonderful journey with GMail so far and I assume that GMail will stay for ever and ever.
Wednesday, April 01, 2009
Getting default browser’s path
Following registry keys hold default browser’s application path
1. HKEY_CLASSES_ROOT\http\shell\open\command
2. HKEY_LOCAL_MACHINE\SOFTWARE\Classes\http\shell\open\command
As expected there is no consistency here. We tested following browsers
- Microsoft Internet Explorer 8.0.6001.18702
- Mozilla Firefox 3.0.7
- Google Chrome 1.0.154.53
- Opera 9.63
- Apple Safari 3.2.1 (525.27.1)
First options works for all browsers except Safari. Second option is not very much recommended. Following code snippet extracts browser path from open command.
using System.Text.RegularExpressions;
using Microsoft.Win32;
RegistryKey regKey = Registry.ClassesRoot.OpenSubKey(@"\http\shell\open\command");
string[] valnames = regKey.GetValueNames();
string val0 = (string)regKey.GetValue(valnames[0]);
Regex regex = new Regex("[\\w\\W\\d]*.exe\"",RegexOptions.IgnoreCase);
Match match = regex.Match(val0);
MessageBox.Show(match.Value);
Top quality teamwork tips
Quoted from
Mastering the Art of Teams and Team-Building: 10 Tips for Top-Quality Teamwork by Randall S. Hansen, Ph.D.
http://www.quintcareers.com/printable/top_quality_teamwork_tips.html
Working in teams is inevitable. For years now, organizational leaders have recognized the added value that comes from having employees work in formal or informal teams, but over the last two decades even greater emphasis has been placed on work teams. Several studies indicate that more than 80 percent of organizations employ multiple types of workplace teams.
Team-building and teamwork skills are essential in the workplace and highly desirable skills to possess when seeking a new job or promotion. Teams working at their potential generate more productivity and better solutions than if all the individual members had worked independently.
How can you be a better team member? How can you get your team to work more effectively as a team? How can you lead your team to success? Here are 10 tips for creating better teams.
1. Foster Open Communications. The best teams are those in which every member shares their thoughts and opinions with the group, and where decision-making is based on dialogue and not dictatorship. But open communication is not just about having an atmosphere in which people can talk freely -- it's also about team members listening to each other and valuing each other's opinions. If your team lacks open communications, bring it up at your next team meeting.
2. Build Trust. Trust is the cornerstone of all effective teams. Without trust, there really is no team, just a collection of individuals working together. Teams need to develop to a point where every member trusts that every other member will do the work required and be an active member of the team. One of the trendy methods of trust-building is having team participate in a ropes-challenge course, where teams work together to solve problems.
3. Set Clear Goals. A team without specific goals will not nearly be as effective as a team with goals. Goals should be specific, including a deadline for completion. But goals should not necessarily always come from the leader of the team; all goals should be discussed by the entire team, especially in situations in which deadlines will be tight.
4. Review Progress. Once goals have been set, the team frequently goes off to complete all the tasks to achieve its goal. This scenario is perfectly fine, except that in too many instances, new information or actions can affect the goal's completion. Thus, teams benefit from conducting regular check-ins with all team members -- perhaps something as often as weekly -- to review progress and iron out any wrinkles or overcome obstacles that have arisen.
5. Encourage Cooperation, not Competition. Despite being placed in teams with co-workers competing with you for your next promotion, you must find a way to collaborate with every member of the team. One of the worst labels in the workplace is that of “not being a team player.” You will be plenty of time to showcase your personal accomplishments, but without your cooperation, your team may not succeed. Collaboration is a must.
6. Focus on Professionalism. The reality of life is that we all have certain types of personalities that clash with our own, but for teams to work, you have to put aside these petty differences and focus on the positive aspects of all team members. Remember that you are not forging lifelong friendships with your team, you simply need to work together to achieve your goals. Downplay people's negative traits and focus on their positives – just as they will yours.
7. Celebrate Differences/Diversity. One of the best trends in society, as well as the workplace, has been a growing diversity of people -- by race, ethnicity, gender, and age. Diversity introduces new ways of thinking and leads to new ideas and better decisions. Rather than feeling uncomfortable that most of the team does not look or act like you, celebrate their individual differences and the value that each brings to the team.
8. Be Enthusiastic. Even if you generally prefer to work by yourself, the reality you are face is that teams in the workplace are here to stay. One way to make the best of the situation is to jump into the team experience with as much enthusiasm as possible. Enthusiasm is contagious, so not only will your enthusiasm help you feel better about being a team member, it will lead other team members to also become more enthusiastic.
9. Share the Work/Do the Work. The best teams are those in which each member plays a vital part in work that results in superior performance; thus it is imperative that each member not only feels he or she plays a vital role, but actually does so. But sharing the work is only part of the equation. The other part is that once the work has been assigned, each team member must be accountable to complete the tasks. Much as been written about the “free-rider” problem within teams, but with individual accountability within the team, people cannot hide from their team responsibilities.
10. Clarify Responsibilities to the Team. Often one of the main causes of team members not completing their work is not because they are “slackers,” but because they simply do not understand their role on the team -- or the importance that their work will lend to the team. The key here is that each team member must totally understand his or her role on the team and responsibility to the team so the team can succeed.
Final Thoughts
Your work life will include individual and team projects and assignments, and as you move up the organization, the importance of working well in teams -- and leading teams to success -- will gain more and more value. If you take these 10 tips to heart, your satisfaction with teamwork and your performance on the team will improve greatly.
Thursday, March 26, 2009
Fxcop stylesheet
For some reason FxCop still points to a stylesheet from gotdotnet which is broken.
Today we had a requirement to generate static code analysis report in a simple tabular form.
Following snippet did the job. You can apply any css on this for better look. Or even set style for table, tr and td.
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<body >
<table>
<xsl:for-each select="//Issue">
<tr>
<td><xsl:value-of select="@Certainty"/></td>
<td><xsl:value-of select="@File"/></td>
<td><xsl:value-of select="@Line"/></td>
<td><xsl:value-of select="../@FixCategory"/></td>
<td><xsl:value-of select="@Level"/></td>
<td><xsl:value-of select="."/></td>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
Just thought of sharing it.
Monday, March 23, 2009
Program Files folder in X64
While qualifying one of our projects for X64, we encountered issue with Program Files folder.
We use Wix for building msi package.
Wix could get %PROGRAMFILES% variable from OS and place files appropriately at C:\Program Files (x86)\
But our product was looking at C:\Program Files.
I prefer picking up any paths from app.config files and delegate task to Wix to update application specific configuration files during installation. This makes life so simple when moving between x86 and x64.
Here I am referring to running x86 stuff as is on x64. Rules change when we compile code for X64.
Friday, March 20, 2009
Internet Explorer 8 Released
Microsoft Released Internet Explorer 8 today. Its app 17 MB download and took 8 minutes to install.
And recommends to restart system after reboot. Compatibility mode, developer tools, quick search in address bar are few impressive things. Private browsing is just like any other new age browser feature. It is supported on
Find release notes at http://msdn.microsoft.com/en-us/ie/dd441788.aspx
Java 6 update 11 is recommended version for viewing Java Applets.
Sunday, March 15, 2009
Clickonce installer with example
Smartclient is a nice concept for occasionally connected applications. A windows forms application or a WPF application can be made available for easy deployment from a webserver, file share etc.. Not only that it can also serve as an update server too.
Though Wix ClickThrough is another concept for similar use, both got unique advantages. And I feel clickonce is the option for today while and clickthrough is evolving.
In this post I would like describe some of the basic concepts of clickonce. I feel this would be useful for anyone starting with clickonce. Let us start with a bit basic theory then move on to infrastructure required to do this and then to an example.
Concept
The core ClickOnce deployment architecture is based on two XML manifest files –Application manifest and Deployment manifest.
The application manifest describes the application itself, including the assemblies, the dependencies and files that make up the application, the required permissions, and the location where updates will be available.
The deployment manifest describes how the application is deployed, including the location of the application manifest, and the version of the application that clients should run.
Following diagram shows how versions will be maintained in webserver. Each version will have its own Application Manifest file.
At any time deployment manifest points to one of the versions. Deployment manifest contains an URI which client tries to access from web browser. Click once infrastructure sends pointed version to client. If client contains that version previously then download doesn’t takes place, rather application is launched in client. Else assemblies of that version will be copied to client.
In Client each version is stored in separate directory in application cache which is typically at “C:\Documents and Settings\username\Local Settings\Apps\2.0\<xx>”. In addition click once application gets added to add/remove programs (if install option is selected in deployment manifest). In such case application automatically gets added to start menu also. Applications also run under Code Access Security purview. By default click once application gets deployed with Full trust permissions. However this can be tuned in application manifest.
Also click once applications can be updated automatically. Assume that client has 1.0.0.0 version installed. Click once applications can be configured to check for new updates during start up (Other options are also available like, after start up, periodically etc.). During start up if application detects if deployment manifest is pointing to new version. If so then it prompts user to update to new version. If user prefers to update new version is downloaded and made as default version. Incase user is not comfortable with updated version, he or she can prefer to un-install latest update and rollback to previous version. This roll back can be done as many times till lower version is available.
In case of large applications downloading entire application in one lot may not be very much acceptable. For such scenarios click once offers optional group mechanism. With this initial download happens for required assemblies only. Options groups will be downloaded on demand.
With this background let us move to infrastructure needed for this.
Infrastructure
Click expects .Net 3.0 to be available in target applications. If your application depends on higher version of .Net framework, then that becomes as necessary pre-requisite. Pre-requisites can be handled using bootstrapper. I will not cover bootstrap here. Let us assume required .Net framework version is available in client machine.
As mentioned above clickonce installation is nothing but “target assemblies + application manifest” + “deployment manifest”. Assemblies are built by developer. Then manifests can be built using mage.exe or mageui.exe. This is part of Windows SDK. Thus if you have Visual Studio 2005/2008 installed on your machine then you have both of these tools.
Apart from these developer also needs a “Personal Information Exchange (.pfx)” file to sign manifests. Signing manifests is mandatory. You may find this post useful to create a pfx file.
Thus to summaries this section. We need following set of files to create click once installation setup
- Application assemblies
- Personal Information Exchange File (pfx)
- mege.exe or mageUI.exe
Now let us start on example.
Example 1 - Basic
Let us take a forms application. As a first step let us try to create a clickonce installation setup for this forms application.
Just start Visual Studio and create a forms application by name “MyWebForm”. Just change Form title to “MyWebForm”. That is it nothing else. Build Release version. So MyWebFormApp.exe is the only file we want to create clickonce installation setup.
In IIS create a virtual directory by name MyWebForm. Under that create a subdirectory by name “1.0.0.0”. And copy MyWebFormApp.exe to that folder.
Now start mageui.exe from “C:\Program Files\Microsoft SDKs\Windows\<v6.0A>\Bin\”.
Create application manifest
- Click on File – > New – > Application Manifest
- Enter Name as MyWebFormApp
- Enter version as 1.0.0.0
- Select ‘Files’ from list box then select application directory. In this case it is “C:\Inetpub\wwwroot\MyWebForm\1.0.0.0”
- Click on populate.
- Then save manifest to folder 1.0.0.0 as “MyWebFormApp.exe.manifest”. MageUI prompts to sign application before saving. Select pfx file and click on save.
Create deployment manifest
- Click on File – > New – > Deployment Manifest
- Enter Name as MyWebFormApp
- Enter version as 1.0.0.0
- Select ‘Description’ from list box enter product name
- Select ‘Deployment Options’ from list box enter start location as “http://<webserver>/MyWebForm/MyWebForm.application”
- Select ‘Update Options’ from listbox and check on “Before Application Starts”
- Select ‘Application Reference’ from listbox and click on ‘Select Manifest’
- Select “MyWebFormApp.exe.manifest” from “C:\Inetpub\wwwroot\MyWebForm\1.0.0.0\” folder
- Then save manifest to folder “C:\Inetpub\wwwroot\MyWebForm\” as “MyWebForm.application”. MageUI prompts to sign application before saving. Select pfx file and click on save.
Installing first version
Open browser and browse to “http://<webserver>/MyWebForm/MyWebForm.application”
A window pops us prompting to install. Click on install. Application starts automatically and form is shown
You can also observe a new start menu short cut and an entry in Add/Remove programs.
So that is quite simple. Now let us add an optional package.
Example 2 – Optional group
Before we proceed uninstall application installed in last step from Add/Remove programs.
Add another project
Now let us create another form project to solution file by name “MyWebFormOptional1”.
Add reference of this project to main project.
Also add a button to main form MyWebFormApp. In button click event add code to launch second form.
private void button1_Click(object sender, EventArgs e)
{
MyOptionalForm1 opt1 = new MyOptionalForm1();
opt1.ShowDialog();
}
Update code to Dynamically load assembly
Idea here is to make MyWebFormOptional1 as optional group. Thus when click once application is downloaded first time only MyWebFormApp gets downloaded. Upon clicking on button “MyWebFormOptional1.exe” gets downloaded.
In order to this we use Application.LoadFile method. Copy paste following code in MyWebForm.cs
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using System.Reflection;
using System.Deployment.Application;
using System.Security.Permissions;
using System.IO;
using MyWebFormOptional1;
namespace MyWebFormApp
{
public partial class MyWebForm : Form
{
Dictionary<String, String> DllMapping = new Dictionary<String, String>();
[SecurityPermission(SecurityAction.Demand, ControlAppDomain = true)]
public MyWebForm()
{
InitializeComponent();
DllMapping["MyWebFormOptional1"] = "MyWebFormOptional1";
AppDomain.CurrentDomain.AssemblyResolve += new ResolveEventHandler(CurrentDomain_AssemblyResolve);
}
/// <summary>
/// Use ClickOnce APIs to download the assembly on demand.
/// </summary>
/// <param name="sender"></param>
/// <param name="args"></param>
/// <returns></returns>
private Assembly CurrentDomain_AssemblyResolve(object sender, ResolveEventArgs args)
{
Assembly newAssembly = null;
if (ApplicationDeployment.IsNetworkDeployed)
{
ApplicationDeployment deploy = ApplicationDeployment.CurrentDeployment;
//MessageBox.Show(args.Name);
// Get the assembly name from the Name argument.
string[] nameParts = args.Name.Split(',');
string assemblyName = nameParts[0];
string downloadGroupName = DllMapping[assemblyName];
try
{
deploy.DownloadFileGroup(downloadGroupName);
}
catch (DeploymentException de)
{
MessageBox.Show("Downloading file group failed. Group name: " + downloadGroupName + "; Assembly name: " + args.Name);
throw (de);
}
// Load the assembly.
// Assembly.Load() doesn't work here, as the previous failure to load the assembly
// is cached by the CLR. LoadFrom() is not recommended. Use LoadFile() instead.
try
{
if (File.Exists(Application.StartupPath + @"\" + assemblyName + ".exe"))
newAssembly = Assembly.LoadFile(Application.StartupPath + @"\" + assemblyName + ".exe");
else if (File.Exists(Application.StartupPath + @"\" + assemblyName + ".dll"))
newAssembly = Assembly.LoadFile(Application.StartupPath + @"\" + assemblyName + ".dll");
}
catch (Exception e)
{
throw (e);
}
}
else
{
//Major error - not running under ClickOnce, but missing assembly. Don't know how to recover.
throw (new Exception("Cannot load assemblies dynamically - application is not deployed using ClickOnce."));
}
return (newAssembly);
}
private void button1_Click(object sender, EventArgs e)
{
MyOptionalForm1 opt1 = new MyOptionalForm1();
opt1.ShowDialog();
}
}
}
Build application and copy MyWebFormApp.exe and MyWebFormOptional1.exe to 1.0.0.0 folder.
Update Application manifest
Open MyWebFormApp.exe.manifest file from 1.0.0.0 folder in MageUI.exe.
Select ‘Files’ from list box then populate files again. Now “MyWebFormOptional1.exe ” gets added.
Click on checkbox to select this file as optional. And enter “MyWebFormOptional1” as group. Save application manifest.
Update deployment manifest
Then open deployment manifest. Select ‘Application Reference’ from listbox and click on ‘Select Manifest’. Select same manifest again and save manifest once again.
Installing second version
- Open browser and browse to “http://<webserver>/MyWebForm/MyWebForm.application”
- A window pops us prompting to install. Click on install. Application starts automatically and form is shown
- Now open application cache at C:\Documents and Settings\username\Local Settings\Apps\2.0\<xx>” and observe that only MyWebFormApp.exe is downloaded.
- Now click on button to open MyWebFormOptional1. Window opens.
- Again check application cache. This time observe that application downloads second form automatically.
Example 3 – Update
Now let us target on auto updating application to new version. Don’t un-install previous application.
- Create a new folder under C:\Inetpub\wwwroot\MyWebForm\ as 1.0.1.0
- Update AssemblyInfo.cs file in solution for both project as
[assembly: AssemblyVersion("1.0.1.0")]
[assembly: AssemblyFileVersion("1.0.1.0")] - Build application and copy MyWebFormApp.exe and MyWebFormOptional1.exe to 1.0.1.0 folder.
- Create Application manifest. But ensure to set version as 1.0.1.0 and also files are populated from 1.0.1.0 folder
- Update deployment manifest by selecting manifest from 1.0.1.0 folder in “Application Reference” pane.
Now launch application from start menu. This time user will be prompted with following dialog.
Hope you found this post useful.
Saturday, March 14, 2009
Experts Exchange subscription
Online forums are always helpful. But sometimes you end up with paid solutions like experts exchange to get a more closure solution. During last year I was in similar situation and paid for a premium service for a month. And I used my credit card to pay it.
But unfortunately EE stored my cc details and charged my card every month since then. As it is small amount it went un-noticed.
Today I just cancelled. I am sure there is no where I selected for a recurring payment on monthly basis. Be watchful about EE.
Monday, March 09, 2009
Find and replace a pattern using VIM
In recent past I am not using shell scripts and thus lost touch with powerful search replace options for patters.
As Windows Find replace options doesn’t support much of regex stuff, I resorted to VIM.
My intention to fix an XML file in structured format.
I have an xml file with elements like following (it is just a snippet of big file)
<PolicyNo>012345678</PolicyNo>
<DateOfCommencement>03DEC1983</DateOfCommencement>
<Plan_Term>579-60</Plan_Term>
<SumAssured>2,00,000</SumAssured>
<GrievanceRedressalOfficer>011-57293184</GrievanceRedressalOfficer>
And I wanted it in the format of
<DataPoint PolicyNo=”012345678”></DataPoint>
<DataPoint DateOfCommencement=”03DEC1983”></DataPoint>
<DataPoint Plan_Term=”579-60”</DataPoint>
<DataPoint SumAssured=”2,00,000”></DataPoint>
<DataPoint GrievanceRedressalOfficer=”011-57293184”</DataPoint>
Following expression did the trick. I got to tweak second back reference little a bit for different cases where non special characters are present.
And I could also prefix line numbers to confine operations to limited set of lines
like
:9,26s…..
:s/<\(\w\+\)>\(.\+\)<\/\(\w\+\)>/<Detail \1=”\2”><\/Detail>/gc
As usual this is a simple search replace syntax in the format of :s/<find string>/<replace string>/<options in this case gc>
Here find string is formatted as
- starting with ‘<’
- start of first back reference \(
- word of one of more length \w\+
- close of first back reference \)
- ending with ‘>’
- start of second back reference <\
- characters repeated till character ‘<’. I changed this for few lines
- escape for close charcter ‘/’
- closing tag. Though this is need not be back reference. I just did it
- ending with >
Now replace string is
- “<Detail “
- First back reference \1
- followed by ‘=’
- followed by opening quotes “
- second back reference ‘\2’
- followed by closing quotes ‘”’
- Closing tag syntax.
And the options include
g – all occurrences in a line
c – seek confirmation
Following links might be useful
And my favorite regex site Regular Expressions Reference
Monday, February 16, 2009
What can you do?
Sometimes we tend to find questions rather than finding answers to questions. We tend do that to introspect ourselves.
But what do we really intend to do with those questions? Are we supposed to find answers for them? Actually NO, I mean not for all of them. Then, why should we collect those questions? Because we want a change the course of life. How can we do that?
According to philosophers, answers are required to execute something but questions are required to change the execution.
Right, next time when something bothers you, formulate a question out of it and add it your collection. Probably when you revisit your collection you may get a clue for a change that you can bring in your life.
Sample what can you do collection goes like this…
What can you do when you are
- being consumed?
- being concealed?
- not assigned any specific goals?
- asked to work for goals that are not yours?
- not allowed to act to your ability?
- not allowed to be creative?
- being manipulated?
- not assessed?
- deprived of opportunities?
- expected to operate without any motivation?
- asked to manage as proxy rather than as a delegate?
Sunday, February 15, 2009
Office Communicator Mobile 2007 R2
Today I installed OCM 2007 R2 on my mobile. I guess my organization hosts OCM 2007 server not R2. But earlier I was not able to connect to server using server specified configuration. But with OCM 2007’s Auto configuration feature it just connects in 10 seconds and brings status of contact quickly. I even tried sending couple of messages. I use Asus P320, Windows Mobile 6.1 Professional. This is very useful for Windows mobile users.
Visual Studio 2010 to include Wix 3.0
We are using beta version of wix. Though it offers everything that is needed at this moment; we are waiting for Wix 3.0 just to be with release product. According to this post Wix 3.0 will be included in VS2010.
Its a really tool for installation needs where module developers can have closure interaction with packaging team.
And for exclusive packaging teams promising way ahead is Wix.
Sunday, February 08, 2009
Web mail services – who is winning
Gmail user base grew 39%, from 18.8 million to 26 million between September 2007 and September 2008 period.
During the same period, Windows Live Hotmail dropped 4% 46.2 million to 44.6 million.
Yahoo continues to lead at 11% growth rate with 91.9 million visitors in December 2008.
Monday, February 02, 2009
Parsing comma separated files (csv) in C#
Parsing a comma separated file. This task sounded so simple for this till the time I received couple of bugs.
Initially I attempted with a simple approach as shown below.
using (StreamReader csvFile = new StreamReader(filePath, Encoding.Default))
{
//First line must contain columns
string csvHeader = csvFile.ReadLine();
//Comma seperated values
csvColumns.InsertRange(0, csvHeader.Split(','));
}
Due to these bugs I educated myself about rules of CSV. I can quote couple of references here.
Wiki Comma-separated values and CSV standard.
Important point to note here is that each value in CSV can contain a comma or newline or quotes embedded in itself.
This simple point changes rules of games very much. Normal stream operations to read line are of no use here. As well split functions are of no use. So only option left out is to parse byte by byte watching for characters to skip and add.
Here is the code using which I managed to parse a Csv and resolved bugs. It is too primitive but works.
namespace CsvUtilities
{
using System;
using System.Collections.Generic;
using System.Data;
using System.IO;
using System.Diagnostics;
using System.Runtime.Serialization;
using System.Globalization;
public sealed class CsvParser
{
//constants
private const int quote = '"';
private const int comma = ',';
private const int carrierreturn = '\r';
private const int linefeed = '\n';
//File Name
private string fileName;
/// <summary>
/// File name propery
/// </summary>
public string FileName
{
get { return fileName; }
set { fileName = value; }
}
//shared reader
FileStream reader;
//Flag to signal end of line
private bool endofline;
#if DEBUG
public TimeSpan timeSpan;
#endif
/// <summary>
/// Default parameter less constructor
/// </summary>
public CsvParser()
{
}
/// <summary>
/// Constructor to set file
/// </summary>
/// <param name="fileName">File name to parse</param>
public CsvParser(string fileName)
{
//Input validation
if (String.IsNullOrEmpty(fileName))
throw new ArgumentException("File name can not be null");
if (File.Exists(fileName) == false)
throw new FileNotFoundException(fileName);
this.fileName = fileName;
}
/// <summary>
/// Parses Csv and return a data table
/// </summary>
/// <param name="headerIncluded">If header is not included colums will be named Column(x)</param>
/// <returns>Data table with rows populated from Csv file</returns>
public DataTable Parse(bool headerIncluded)
{
DataTable dataTable = new DataTable();
dataTable.Locale = CultureInfo.InvariantCulture;
DataRow dataRow;
//Input validation
if (String.IsNullOrEmpty(fileName))
throw new ArgumentException("File name can not be null");
#if DEBUG
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
#endif
using (reader = new FileStream(fileName, FileMode.Open))
{
string value;
int idx = 0;
//Header handling
while (reader.Position != reader.Length)
{
value = GetValue();
if (headerIncluded == false)
{
value = "Column" + idx++;
if (endofline)
reader.Position = 0;
}
try
{
dataTable.Columns.Add(value);
}
catch (DuplicateNameException dnex)
{
throw;
}
if (endofline) break;
}
//Initialization
dataRow = dataTable.NewRow();
idx = 0;
endofline = false;
//Row handling
while (reader.Position != reader.Length)
{
value = GetValue();
dataRow[idx++] = value;
if (endofline)
{
dataTable.Rows.Add(dataRow);
dataRow = dataTable.NewRow();
idx = 0;
endofline = false;
}
}
}
#if DEBUG
stopwatch.Stop();
timeSpan = stopwatch.Elapsed;
#endif
return dataTable;
}
private string GetValue()
{
char currentByte; // Current Byte
char nextByte; // Next Byte
Boolean withinQuote = false; // Is current position within a quote
List<char> bytes = new List<char>();
long position = 0;
//If stream is null throw exception
if (reader == null)
throw new ArgumentException("CSVDataset: stream can not be null");
if (reader.CanRead == false)
throw new ArgumentException("CSVDataset: Can not read stream");
while (reader.CanRead)
{
currentByte = (char)reader.ReadByte();
position = reader.Position;
//If at last position terminate
if (position == reader.Length)
break;
//peek next character
nextByte = (char)reader.ReadByte();
//As ReadByte moved cursor ahead bring cursor back
reader.Seek(position, SeekOrigin.Begin);
//Current character is within the quote
if (withinQuote)
{
//Is this character a terminating quote
if ((currentByte == quote) && (nextByte == comma))
{
reader.Seek(position + 1, SeekOrigin.Begin); //jump comma
break;
}
if ((currentByte == quote) && (nextByte == quote))
{
continue;
}
else
{
bytes.Add(currentByte);
continue;
}
}
if (currentByte == quote)
{
withinQuote = true;
continue;
}
if (currentByte == comma) break;
if ((currentByte == carrierreturn) (currentByte == linefeed))
{
if ((nextByte == carrierreturn) (nextByte == linefeed))
{
reader.Seek(position + 1, SeekOrigin.Begin); //jump CR+LF
}
endofline = true;
break;
}
bytes.Add(currentByte);
}
//Reading value completed. Return
return new string(bytes.ToArray());
}
}
}
Hope this is useful to start with. Advanced parsers can be found at codeproject http://www.codeproject.com/KB/database/CsvReader.aspx
Tuesday, January 27, 2009
Mcshield delaying system boot up time
Of late I am observing that my laptop takes long time to boot up. I am hunting in event log to to find probable reason.
One thing that I observed today is about vmware disk images. Because these are huge file Mcshield takes long time to scan those file for virus.
Actually I got these files in D drive. But still upon boot up these these files are being scanned causing delay in boot up time
“A thread in process C:\Program Files\McAfee\VirusScan Enterprise\Mcshield.exe took longer than 90000 ms to complete a request.
Build VSCORE.13.3.2.128 / 5300.2777
Object being scanned = \Device\HarddiskVolume2\xxxxx-x86.tar.bz2”
Any ideas on how to overcome this issue.
Saturday, January 24, 2009
My Documents Folder in WinXP
Wednesday, January 21, 2009
Friday, January 16, 2009
Google maps mobile update
Updated to new google maps for windows mobile.
Though street view is not so much facinating for Indian users, there are couple of things that are interesting.
1) Now application can be installed on storage card. Thus even application data can be stored on Storage card. No more limitation of phone memory.
2) My location can be triggered from map itself. No need to go to Menu.
I am more happy for first point.
Monday, January 12, 2009
Why Should You Care about REST?
MSDN Magazine January 2009 edition published first article in Service Station column about building WCF services using REST.
Article is titled An Introduction To RESTful Services With WCF by Jon Flanders.
In this article Jon mentioned about “Why Should You Care about REST?” as
“In my mind, there are two main reasons. First, REST offers some significant features and benefits over RPC technologies in many cases. Second, Microsoft is moving many of its own implementations away from RPC technologies (such as SOAP) and toward REST. This means that even if you aren't convinced or motivated to use REST to build your own systems, as more frameworks and technologies from Microsoft (and others) move to REST, you'll need to know how to interact with them.”
With that it is quite obvious that Microsoft/WCF is moving towards REST in year 2009.
In this post I would like to collect some of the good links to start on REST.
- HTTP - Hypertext Transfer Protocol – To get overview of HTTP
- Dissertation of Roy Thomas Fielding – To understand REST architecture
- A Guide to Designing and Building RESTful Web Services with WCF 3.5
- Fiddler – Tool to capture http traffic
Sunday, January 11, 2009
Updated to Windows Live Writer 2009
Updated to live writer 14 which is part new live update pack.
Notable changes
1) Preview
2) Segeo font
Friday, January 02, 2009
Friday, December 26, 2008
Get hosting environment
What is easiest way to get to know about type of web server?
Why not with our age old friend 'telnet'?
==============================
[harsha@JD11AF08 ~]$ telnet sriharsha.net 80
Trying 75.126.232.124...
Connected to sriharsha.net.
Escape character is '^]'.
GET / HTTP/1.1
Host: sriharsha.net
HTTP/1.1 200 OK
Date: Fri, 26 Dec 2008 16:36:23 GMT
Content-Length: 573
Content-Type: text/html
Content-Location: http://sriharsha.net/Default.htm
Last-Modified: Sat, 03 May 2008 10:52:12 GMT
Accept-Ranges: bytes
ETag: "72b52dbebadc81:20844d"
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
....................
==============================
Same thing can also be obtained with HEAD
HEAD / HTTP/1.1
Host: sriharsha.net
Tuesday, December 23, 2008
CLR version on client
Click once deployment requires CLR to be present on target machine to verify manifest. Following snippets help in detecting the same.
Snippets work with IE5+ and Firefox 3+ only. They fail with Chrome.
Server Side:
protected void Page_Load(object sender, EventArgs e)
{
bool clrOK = Request.Browser.ClrVersion > new Version(3, 0);
if (clrOK)
Response.Write("CLR Version : " + Request.Browser.ClrVersion);
else
Response.Write("This machine does not have Microsoft .NET Framework version 3.5 SP1. Install and try again.");
}
Client Side:
<script type="text/javascript">
function DetectCLR(targetVersion) {
//Gets CLR version on IE5+ and Firefox 3+
var uagent = navigator.userAgent;
//Form regular expression
var clrRegExp = new RegExp(".NET CLR \\d.\\d.\\d{5}", "g");
//Get Matches
var clrversions = uagent.match(clrRegExp);
//Is CLR present
if (clrversions == null)
document.write("Please install Microsoft .Net Framework 3.5 SP1 and try again");
//Possible .NET CLR versions for v2 and above
//.NET CLR 2.0.50727
//.NET CLR 3.0.04506
//.NET CLR 3.5.21022
//.NET CLR 3.5.30729
var arraylength = clrversions.length;
var clrversiondetected;
for (var i = 0; i < arraylength; i++) {
//document.write(clrversions[i] + "</br>");
clrversiondetected = (clrversions[i] == targetVersion);
}
if (!clrversiondetected)
document.write("Please install Microsoft .Net Framework 3.5 SP1 and try again");
else
document.write("CLR Version : <b>" + targetVersion + "</b>");
return clrversiondetected;
}
</script>
Monday, December 22, 2008
Saturday, December 20, 2008
BOSS Linux - Bharat Operating System Solutions
I just came across BOSS.
BOSS (Bharat Operating System Solutions) GNU/Linux distribution developed by C-DAC (Centre for Development of Advanced Computing) derived from Debian for enhancing the use of Free/ Open Source Software throughout India. BOSSGNU/Linux - a key deliverable of NRCFOSS has crossed another milestone by releasing version 3.0. BOSS GNU/Linux Version 3.0 is coupled with GNOME and KDE Desktop Environment with wide Indian language support & packages, relevant for use in the Government domain. Currently BOSS GNU/Linux Desktop is available in almost all the Indian Languages such as Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Oriya, Punjabi, Sanskrit, Tamil, Telugu, Bodo, Urdu, Kashmiri, Maithili, Konkani, Manipuri , which will enable the mainly non-English literate users in the country to be exposed to ICT and to use the computer more effectively.
Friday, April 11, 2008
How to access explicit interface methods from another method
Explicit interface methods are prefered to instance members for following reasons
a) They are accessible only through interface
b) They support multiple levels of inheritance chain
Some insight. Take this example.
a) Take simple console application
b) Add an interface
c) Implement interface
d) Create Object of the class and try to access interface method
e) Cast it with interface and access
f) Call another explicit method method
g) Call another explicit methos by casting this object with interface
using System;
using System.Collections.Generic;
using System.Text;
namespace ClassMonitorLibrary
{
public class Program
{
//Entry point
public static void Main(String [] args)
{
#region Not accessible directly
//Object is accessible
//IFile im = (IFile)(new InterfaceMethods());
#endregion
InterfaceMethods im = new InterfaceMethods();
im.GetFile("Data.txt", 10);
}
}
//Simple interface
interface IFile
{
bool GetFile(string fileName, int length);
bool IsFileExists(string fileName);
}
//Interface implementation
class InterfaceMethods : IFile
{
bool IFile.GetFile(string fileName, int length)
{
#region how to access method
//if (((IFile)this).IsFileExists(fileName))
#endregion
if (IsFileExists(fileName))
Console.WriteLine("FileExists");
return true;
}
bool IFile.IsFileExists(string fileName)
{
return true;
}
}
}
Tuesday, March 18, 2008
Microsoft Architecture Days : India
Register quickly as limited seats available for focused event on Software + Services.
Saturday, March 15, 2008
Vim settings
Back up and swap files that vim creates are sometimes frustrating; that too when you are working on source tree in version controlled directory.
Setting them off is desirable for such environments. Today I updated my Vim profile to do this and also found consolas font more appealing font for Vim editing on windows. Just thought of sharing profile file if anyone else interested in it.
C:\Program Files\Vim\_Vimrc file
set nobackup "Set off backup files"
set noswapfile "Set off swap files"
"Fonts"
set guifont=Consolas:h12:cANSI
"Tabs"
set stal=2
"GUI Options"
set guioptions-=T "No Toolbar"
set guioptions-=r "No right scroll "
set guioptions-=m "No menu "
"Color scheme"
colorscheme wombat "tried zellner, now wombat suits me best"
Friday, March 07, 2008
MsMUG Survey
AutomationFederation.org is conducting and online survey on Issues in the Manufacturing Environment.
Survey results will be published in Microsoft's forum.
During April 2-4, 2008, Microsoft will be hosting a forum specifically addressing issues in the Manufacturing environment. The Microsoft Manufacturers User's Group (MsMUG) has been working side-by-side with Microsoft in preparation for this event.
Saturday, January 26, 2008
Monday, January 21, 2008
Visual Studio Task List feature - I regret for not knowing this earlier
Do you use Visual studio's Task List? Atleast I was't using it so far and I regret for not knowing this earlier.
How it works
It works like this, you might want to do something at code at later point in time, just add a comment in this syntax
//TODO: Add exception handling here
or you are doing code review and felt a message should have been logged here just add a comment like this
//LOG: recursive function, log depth of stack
or you might have felt a method has grown too big, it must be refactored, just add a comment where ever you felt like
//REFACTOR: break this part as a function
How to access
And visual studio has a window to search for all these comments and present them in a list.
To show task list select from View -> Task List
By default it shows user task, select for comments from combobox.
How to customize
You can set more tokens from Tools -> Options Check in image here. TODO, HACK, UNDONE and UnresolvedMergeConflict are default tokens in visual studio. I added LOG and REFACTOR.
Check how a
TODO: Clean up and Log Something in code is shown with. And it is a clickable target
Also you can sort all log messages or TODO's to attend once.
Also you can define user defined tasks too.
For that select User tasks option in Task List and add your taks.
I felt the option very handy and felt like sharing it.
Unit Testing for Native Code
As I am spending most of time in CLR sandbox these days, I couldn't realise that NUnit can't work for native code. But that is a bit disappointing, having spent those many years developing C++ code, I couldn't accept flag go down. YES there is a Unit testing framework available for native C++ too. It is called WinUnit and MSDN Febraury 2008 article Simplified Unit Testing for Native C++ Applications describes the details.
Salient features are
a) Its built as a regular C++ dll
b) You can run it from Command Line by passing dll name as argument to WinUnit.exe
c) You can integrate with Visual Studio Build process, so whenever you build the project Unit Tests are performed
Compared to NUnit:
a) It lacks UI
b) Lacks report generation features
Friday, January 04, 2008
xcerion XML internet OS/3
Today I got my beta account for xcerion. First impression is "it is awsome", but doesn't work in firefox.
It works only with IE 6 or 7 on windows. Its an xml based web operating system. Applications are developed using Model View Controller framework.
Though it is not from big giants like Microsoft and Google, it paved a way towards it. It may not be adapted in low bandwidth environments, though there is an offline feature.
Tuesday, January 01, 2008
Wednesday, November 28, 2007
Midlets using MotoDev Studio
MotoDev Studio is a new IDE platform to develop J2ME applications for Motorola phones.
I just did a basic test using this.
Prerequiresites:
- Install latest JDK SE
- Install MotoDevStudio
Build Steps:
- Launch MOTODEV Studio for Java(TM) ME v1.0
- Create a new project
- Select Phone type
- Create a package under src folder. (not necessary, but a good practice. else default package will be selected)
- Create class
- Add code (simple hello world using LCDUI)
package MyPackage;
import javax.microedition.lcdui.*;
import javax.microedition.midlet.*;
public class HelloWorld extends MIDlet
{
private Form mainScreen;private Display myDisplay;
public HelloWorld()
{
myDisplay = Display.getDisplay(this);mainScreen = new Form("Hello World");
StringItem strItem = new StringItem("Hello","This is a J2ME MIDlet.");
mainScreen.append(strItem);
}
/**
* Start the MIDlet
*/
public void startApp() throws MIDletStateChangeException
{
myDisplay.setCurrent(mainScreen);
}/**
* Pause the MIDlet
*/
public void pauseApp()
{}
/**
* Called by the framework before the application is unloaded
*/
public void destroyApp(boolean unconditional)
{}
} - Edit Jad file MyMidlet.Jad; Goto Midlets tab (below); Add Midlets e:g MidletName,icon,Package.Class
- Build project, default setting is autobuild
- Create Package;Rightclick on MyMidlet Project (root element), From pulldown menu click J2ME and click on select CretePackage
- If everything goes well you should have MyMidlet.jar and MyMidlet.Jad files in ~:\Documents and Settings\user\workspace\MyMidlet\deployed
- Use MidletManager or your favourite tool to install this midlet
- Restart phone and test midlet
Note: This is without signing
Monday, November 26, 2007
VIM ftp editing
Just Open ftp://domain.tld/path/to/file
It prompts for username and password!!!
Friday, October 26, 2007
Dave Winer on SOAP, REST and XML-RPC.
Note: Reproduced from site
SOAP Request
GET /stock HTTP/1.1
Host: www.kbcafe.com
<?xml version="1.0"?>
<soap:Envelope xmlns:soap="http://www.w3.org/2001/12/soap-envelope"
xmlns:m="http://www.kbcafe.com/stock">
<soap:Header>
<m:DeveloperKey>1234</t>
</soap:Header>
<soap:Body>
<m:GetStockPrice>
<m:StockName>HUMC</m:StockName>
</m:GetStockPrice>
</soap:Body>
</soap:Envelope>
The SOAP request tends to be overly verbose, but generally (not always) easy to understand. The SOAP envelope wraps an optional Header and the Body. The Body contains the actual XML request object. The Header contains information not required to service the request, but that help in some other way. A common use of the SOAP Header is the attachment of user credentials to the request. The beauty of SOAP is that it's not bound to the HTTP transport, although HTTP is by far the most commonly used transport for SOAP messages.
SOAP Response
HTTP/1.1 200 OK
<?xml version="1.0"?>
<soap:Envelope xmlns:soap="http://www.w3.org/2001/12/soap-envelope"
xmlns:m="http://www.kbcafe.com/stock">
<soap:Body>
<m:GetStockPriceResponse>
<m:Price>27.66</m:Price>
</m:GetStockPriceResponse>
</soap:Body>
</soap:Envelope>
The SOAP response, also slightly overly verbose, but easy to understand. Now, imagine getting this response and you want the m:Price element text. How would you formulate the XPath to acquire this bit of information? Very simple ("//m:Price/text()").
REST Request
GET /stock?StockName=HUMC HTTP/1.1
Host: www.kbcafe.com
The beauty of REST is that GET requests do not require a request package. In this case, the parameter data is simply passed as an HTTP request parameter. Of course, this simplicity comes with a price. You are now bound to the HTTP protocol. Although the argument has been made that the principles of REST are not bound to HTTP, nobody has ever documented that approach.
It's important I explain here that REST is primarily used to service GET requests. By GET, I mean REST requests using the HTTP GET verb. REST uses three other HTTP verbs POST, PUT and DELETE. GET is used to retrieve data, POST to create, PUT to edit and DELETE to ... umh... delete. The HTTP verb is another element that binds the REST protocol to HTTP.
REST Response
HTTP/1.1 200 OK
<?xml version="1.0"?>
<m:Price xmlns:m="http://www.kbcafe.com/stock">27.66</m:Price>
The REST response is often very similar to the contents of the SOAP request body. And it's very easy to express an XPath and retrieve the data ("//m:Price/text()"). Note the REST response is actually simpler than the SOAP Body. This is by convention. SOAP transactions typically contain an element named with the method name and the strings Request and Response appended. This is actually unnecessary, but does have one advantage. REST request typically embed the method name within the request URL, which again binds the protocol to HTTP, whereas embedding the method name within the package allows SOAP to exist over any protocol.
XML-RPC Request
POST /stock HTTP/1.1
Host: www.kbcafe.com
<?xml version="1.0"?>
<methodCall>
<methodName>stock.GetStockPrice</methodName>
<params>
<param>
<value><string>HUMC</string></value>
</param>
</params>
</methodCall>
The XML-RPC request is the most verbose of all the protocols. This verbosity is what makes XML-RPC both difficult to use and lacking in inter-op. For example, the <string> element is optional, so the <value> element could have been expressed as <value>HUMC</value>. Several MetaWeblogAPI (the most common implementation of XML-RPC) providers assume the <string> element is present, while others assume it's not present. Whichever way you code it, at least one MetaWeblogAPI provider will fail. This leads to a lot of frustration for developers trying to write XML-RPC applications. You often have to code the request differently to compensate for different server implementations.
XML-RPC Response
HTTP/1.1 200 OK
<?xml version="1.0"?>
<methodCall>
<methodName>stock.GetStockPrice</methodName>
<params>
<param>
<value><double>27.66</double></value>
</param>
</params>
</methodCall>
Again, the XML-RPC response is the most verbose and again it contains a gross limitation. What's the XPath for retrieving the data? If you said ("//double/text()"), then we might have a problem down the road. Imagine the provider augments his response with new elements that includes the stocks 365-day high and 365-day low. The XPath would fail. Neither SOAP or REST have this problem, because the elements are named semantically.
Thursday, September 27, 2007
Happy Birthday to Google
Today is 9th birthday for Google.
WISH YOU MANY MANY HAPPY RETURNS OF THE DAY GOOGLE.
Saturday, August 11, 2007
Regular Expressions - Part3
Meta Characters
Meta characters are a sub-set of ASCII character set which take part in building a regular expression. e.g. +,$,^ etc.. Thus these characters instruct regex engines to perform specific operations. If we want to instruct regex engine to deal with thm as normal characters instead of meta characters we need to escape them with backward slash '\'.e.g regex "firstname\.lastname" instructs the engine to ignore special meaning of '.' and to consider as a '.' character.
Following list gives overview of mostly used meta characters.
. Dot - any character in a line
In normal search we use ? to specify a single wildcard character and * to specify sequence characters till next character. e.g. to search a file we use IAdb*.dll But in regex * is used for repetition. Also note that in a line phrase. This mean that the behavior of ‘.’ can be altered using mode settings like SingleLine or MultiLine mode to notify regex engine whether to match a newline (\n or \r\n) with a ‘.’ or to stop at new line. e.g:
Search String:
using System;
using System.IO;
using System.Text;
RegEx: “System.*”
Explanation: In MultiLine mode this matches all references with System and its decedents till end of line.
Matches in non-Single Line mode:
a)System;
b)System.IO;
c)System.Text;
Matches in Single Line mode:
a)System;System.IO;System.Text;
Note that semicolon is also matched in each line
\ - back slash
It is already mentioned that these are used to instruct regex engine to consider them as
normal characters. And when used with a number like \1 or \2, this specifies a back reference number. Back references will be covered seperately.
[ ] - opening and closing square bracket
Any group of characters to be matched are specified within these brackets. Examples are mentioned below.
( ) - round brackets
These are used to ho sub-expressions or back references. Back references will be covered later. Sub-expressions are similar to programming language sub expressions.
{ } curly brackets
These are used with iterators. We have seen this in Part 2 for five digit length. Its format is like
mandatory and can contain values from 0 to value of y. And y is optional to specify. and has to be any integer.
* Iterator to iterate for zero or more times. (0 or more times)
? Iterator to iterate for zero or one time only. (0 or 1 times)
+ Iterator to iterate for at least once or more times. (1 ore more times)
\w alphanumeric character including underscore
\W non-alphanumeric
\d numeric character
\D non-numeric
\s any white space; include
Thursday, August 09, 2007
Regular Expressions - Part2
In this part I will discuss about how regular expression engines work. Before going there I just want to make a point about non-printable characters. Just have a look at ASCII chart once. Look for control characters, printable characters, non- printable characters etc..
Brief on Regex Engines
Regex engine is a library that can process regular expressions for search and replace operations. These engines operate on each character and thus composed of complex algorithms.
Editors supporting regex provide front end for this library. You can use this library with programming languages. As such there is a native support for regular expressions in some languages like Pearl, Ruby etc.. Microsoft got regex support from vbscript days. vbscript supports regular expressions with RegExp object. And look for class library reference of System.Text.RegularExpressions namespace for regex support in .Net. We used this in last example.
As these engines operate on each character and try to match with patterns; performance plays a major role. There are two kinds of engines to chose from. One is called Text Directed Engine while other one is Regex Directed. I will describe them in brief.
Of course no programming language offers a choice to select a particular regex engine. Its matter of what it supports.
In case of editors there is a simple way to test what kind of engine it is. Just apply regular expression "regex | regex not" on string regex not. If search result is regex then it is regex engine otherwise if the search result is regex not then it is text directed engine. There are couple of points to note here.
- Expression you applied tries to match string "regex" or "regex not". '|' is a meta character in regular expressions character set. You can think this as an instruction for 'or' operation.
- When you apply this regex on string regex not it got option to chose just the first word or both words. Regex engines characterizes eagerness, mean they are eager to return results to be quick and faster. Where as text based engines look for longest possible match. Thus former returns regexregex not.
In simple terms, one can categorize them based on search time. Say that with text directed engines, search time depends on length of the search string. And with regex directed engines search time depends on length of the regex.
Back Tracking
Another important concept is back tracking. As the name suggests back tracking is about tracking how we traversed while coming back. Did I confuse you? That is not intentional; let us see this with an example.
Suppose you are looking at financial report. And you are interested in values between 10K to 99K.
Now let us take figure "INR75063.37" . You know that it starts with INR and it is in the format of INR(x).(y) where x >= 0 and y = (00 to 99).
So you are looking for five digits before paise separator '.' This can be expressed as
"\bINR[0-9]{5}\.[0-9]{2}\b"
Let us apply this regular expression on following text
The amounts are Auto INR30.00 Onward Flight INR450454.34
Taxi INR435.65 and Return Flight INR34543.43 etc.
Observe result with five digits INR34543.43 the figure in desired range.
Now let us walk through what happened here.
First with regular expression - \bINR[0-9]{5}\.[0-9]{2}\b. Outer \b at the beginning and at the end specify word boundary.
Obviously all amounts are within a word boundary. (To check about word boundary click at the start of search string, then press Ctrl+[right arrow ->]. Of course this includes space, forget space for this discussion.). So here goes the part wise description of this regex
- INR will be matched literally.
- Then [0-9]{5} specifies to look for 5 digits(0-9).
- Then we have a dot '\.' dot is a meta character in Regular expression character set, thus we need to escape it. It might work without escaping too, but just follow it as a good practice.
- Then we have [0-9]{2} which is obvious for two digit of paise. And closing of boundary with '\b'
Ok we understood the regular expression, now let us understand how regex engine might have done this search.
It starts with text string 'T' in The of search string, and looks at part1 of regular expression which is expecting I of INR, it is invalid and thus moves to next word.Then it finds 'a' in amount, which is invalid and moves ahead.
When it comes to INR30.00, it matches INR then moves regex to [0-9]{5}.
Here regex can match it in two ways.
(a) Collect characters till paise separator '.' then verify that each one is a digit(0-9) and number of digits is 5. If paise separator is not found till end of word (\b) skip this word
(b) Collect each character and then check it to be a digit and add to count, if the count is 5 then check for paise separator as sixth digit. If sixth character is a paise separator search passes else search fails am moves to next word.
Here first option (a) assumes that if paise separator is not there, then no need to check each character to be a digit and to add to counter. If it finds a paise separator then it will track back each character and then does necessary operations.
In case of (b) engine anticipates paise separator and does character validation and counter operation for each character.
Assume a case where we are searching for 10 digits. In that case performance penalty with option (b) would be very high.
Option (a) describes case of backtracking. Backtracking is like leaving a pile of bread crumbs at every fork in the road.
If the path we choose turns out to be a dead end, then we can retrace our steps giving up ground until we come across a pile of crumbs that indicates an untried path. Should that path, too, turn out to be a dead end, we can continue to backtrack, retracing our steps to the next pile of crumbs, and so on, until we eventually find a path that leads to our goal or until we run out of untried paths [1].
With text directed engine each character is looked at only once, while with regex directed engines each characters might be looked at many times. Regex directed engines support back tracking and Text Directed engines doesn't.
While at back tracking I should mention about meta character dot '.' in Regular expressions. Dot means any character (except new line). Back tracking plays major role when working with dot. We will look into dot and all other meta characters in next post.
Next -> Regex Meta Characters
Wednesday, August 08, 2007
Regular Expressions - Part1
Regular expressions. Most (u|li)n(i|u)x programmers might have used them with grep. What are they and why they are so different? By this time you might have got a doubt about (u|li)n(i|u)x. If you interpretedit as unix or linux, then you know about Regular expressions.
If you don't know yet you could interpret it; then regular expressions are not difficult for you. It is just a matter of time you pick a book and understand the rules of this new game. If you got puzzled about those words then just think that you are learning a new technique.
What is this all aboutRegular expressions can be used to search and replace text patterns in more structure manner. I am not defining the word regex here. But just bringing you to the context of text search / replace and structure. Just remember structured text search and replace.
Some taste and smell
\b[A-Za-z0-9._%\+-]+\@[A-Za-z0-9-]+\.[A-Za-z0-9-]+(\.[a-z0-9-]+)?\b
Before we start:
1) It takes time and requires dedicated time of at least 10 hours before you gain momentum with regular expressions. But here I want to make this process simple and easy. Thus I want to stretch these 10 hours over 10 days, one hour on each day.
2) need a regex editor to test. You can pick some from google. But I feel it is better to write your own with minimal effort.
Build MyRegex test tool (Option 2 as mentioned above):At the core, this tool is going to have three text boxes. (a) to enter text to be searched (b) to enter regex pattern (c) to show results.
Optionally we can have some check boxes to select few options and labels to address text boxes. I used .Net 2.0 and C# windows application.I attached screen shot here. I used a context menu on regex textbox to avoid a button click.
In designer code just add this.tbRegex.ContextMenuStrip = this.contextMenuStrip1;I named my regex text box as tbRegex. You can also find two check boxes to select
Singleline mode and Multiline mode. Don't bother about these things now.Just add them in a frame for better looks. Then just add references for RegEx and event handler on
Find Context menu click. See code below for form.cs.using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Text;
using System.Windows.Forms;
using System.Text.RegularExpressions;
namespace RegExApp
{
public partial class Form1 : Form
{
RegexOptions regexoptions;
public Form1()
{
InitializeComponent();
}
private void findToolStripMenuItem_Click(object sender, EventArgs e)
{
tbreults.Text = "";
string results = "";
string txtstr = tbintext.Text;
string regex = @tbRegex.Text;
if (cbSingleLine.Checked)
regexoptions = RegexOptions.Singleline;
if (cbMultiLine.Checked)
regexoptions = RegexOptions.Multiline;
MatchCollection matches = Regex.Matches(txtstr, regex, regexoptions);
foreach (Match m in matches)
results += m.Value + Environment.NewLine;
tbreults.Text = results;
}
}
}
Now we can test this tool. Just copy and paste following text in top text box i.e. serach string box,
my email id is firstname.last@gmail.com onlinesend me mails to FIRSTNAME.LAST@GMAIL.COM collections
yahoo id is firstname_last@yahoo.com on check
also aliased to Firstname.Last@yahoo.com checked often
and hotmail ID is firstname-last@hotmail.com least
or name@net.co.in is also fine
then copy and paste following regex in regex textbox (use Ctrl+v as right click opens context menu)
\b[A-Za-z0-9._%\+-]+\@[A-Za-z0-9-]+\.[A-Za-z0-9-]+(\.[a-z0-9-]+)?\b
right click on regex text box to trigger context menu and click on Find.
Check following results in results text box
firstname.last@gmail.com
FIRSTNAME.LAST@GMAIL.COM
firstname_last@yahoo.com
Firstname.Last@yahoo.com
firstname-last@hotmail.com
name@net.co.in
Experiment by adding new email ids, extra tld's etc..
Next -> Non printable characters, Regex engines and how it works?