06.25.08

seattle girl geek

Posted in ramble, tech at 12:10 am by stacywong

Tonight I attended the first seattle girl geek dinner held on Microsoft campus. The theme was “career development”.

I’ve been to a lot of career building/talking/planning/thinking type of women’s events, and I’ve realized that I only go for two reasons: networking and anecdotes. Not for the main event. Free food is a compelling third reason to attend. Tonight we had quite a spread: fruits, satay, lots of desserty things, wine and an entire cooler of Mike’s Hard Lemonade; definitely a refreshing change from beer and chips and pizza.

I guess what bothers me is that I often see panelists as larger-than-life, extremely successful women who are completely unlike me. They’re not relatable. It’s hard to look at them and think “yeah, that could be me someday” (although I’m fully aware that that is the exact sentiment which is supposed to be fostered by such events). I wish they would talk more about their struggles… especially the smaller day-to-day ones. I wish someone would tell a story about how they were having a bad day and had to cry in the bathroom until they felt better because men freak out when women cry. I wish someone would tell a story about how they sat at their desk composing and recomposing an email because they wanted to come across as forceful but not bitchy. But none of that. The stories are all about taking risks, being confident, being unafraid, but not a lot of focus on how you get to that point.

One of the panelists talked about dealing with a Japanese client who suggested that she hire a new VP of international sales; a subtle hint that she needed a male liason to do business with him. For her, hiring a male associate was a no brainer because it was the direct route to accomplishing her goal. “I didn’t want to lose, I wanted to win!” she said. I cringed a little. Sure, she made the right decision for the business… but I wonder to what extent she would compromise her behavior and beliefs in order to achieve her objectives. If women need to perpetuate male expectations in the workplace to ensure our success, how can we ever hope to create an environment where being female (whatever that means) contributes to instead of hinders our success?

Not that the evening was a total bust. I had an awesome carpool of women from Amazon. We talked about our teams, our own career paths, and bitched about male corporate culture. The carpool alone was worth the entire event for me because I made some new friends and hearing them speak was a more relatable and hence fruitful experience.

I’m definitely in for the next girl geek event. I might be jaded, but I do enjoy meeting other women in industry. And sometimes, in spite myself, I come away feeling empowered and less alone.

05.20.08

Evaluating equivalent solutions

Posted in tech at 1:43 am by stacywong

Consider the following regexes, numbered for readability:

1 f?oo
2 [f]?oo
3 [f?]oo
4 (f?)oo
5 (f)?oo

Do they behave as expected? Do they behave in the same way? Are they equivalent?

To answer if they behave as expected, we just need to set what our expectations are. Looking at the regexes, it’s pretty obvious that we are interested in matching on “oo” prefixed with an optional “f”. Now that we know what our expectations are, let’s throw these test cases at it, “oo” and “foo” seem to be appropriate.

We run our tests and discover all except for #3 have the expected result of a match success for both those test cases. Specifically, sneaky #3 will correctly match “foo” but fail to match “oo”.

Now on to the next question, “Do they behave the same way?” You might be tempted to say “yes” because they appear to return the same result given the same input. But, if you’re pedantic about regexes, you’ll remember that parentheses indicate that matched text should be stored in a capture group. I wrote a little program to print “$+” after regex matching, and got the following:

Input: foo
1 f?oo  MATCH           last capture:
2 [f]?oo        MATCH           last capture:
3 [f?]oo        MATCH           last capture:
4 (f?)oo        MATCH           last capture: f
5 (f)?oo        MATCH           last capture: f

When you’re evaluating different options on how to code something, don’t just look at inputs and outputs. Side-effects have the potential to be very dangerous, especially because they’re hard to debug. In this case we’re probably fine, but of course it depends on what your application is doing and who is calling you and how often…

So now we’ve looked at functionality and some of the side effects of our examples. Is it safe to say that options 1 and 2 are the same, and options 4 and 5 are the same? Not yet. Here is where the intangibles come in. For example, it can be argued that options 2, 4, and 5 are more readable, and hence better choices. Also consider that requirements might change — what if I wanted to change my regex such that it would match “boo” and “foo” but not “oo”? Then option 2 makes sense because the square brackets indicate character set, and I would only have to add the letter “b” in there and I’m done. Okay, how about “boo” and “foo” and “taboo”? Uh-oh. Now we can’t use the character set because we’ve introduced prefixes longer than a single character. Maybe 4 or 5 are the way to go…

This is my favorite part of the code reviewing process. It’s easy to tell if the code is working*, it’s moderately easy to tell if it’s not doing anything bad, but figuring out if it’s extensible, maintainable, scalable? That’s more art than science.

BTW, if we’re deciding between 4 and 5, my pick is for 5 since it makes it clear that the “?” is acting on the entire group of options, and not just the last one. :)

* Especially if you have tests!

05.02.08

Gold star stickers

Posted in tech at 2:52 am by stacywong

I pushed some bad software this week which had some unfortunate ramifications… but even worse than pushing broken code, the problem remained undiscovered for a long time because I neglected to test my stuff after the push. No excuses, no prevaricating — I did not test because I was being lazy.

How do we prevent this from happening in the future? You can document your standard operating procedures, but at the end of the day it’s at the discretion of the human to actually read and adhere to SOP.

This is why automation is good. You can automate your tests, and tie them to your development process — code can’t be checked in, built, or deployed unless all tests pass. However, setting up the framework for such automation across several systems often requires time, and even when you get to the point of a fully automated system, problems are still going to slip through the cracks. So until you perfect your automation, you still need to keep a human in the loop.

With that in mind, the question becomes how do you make sure your human does the right thing? First you need to make it easy to do the right thing. Then you need to create incentives to do the right thing.

Beyond personal integrity and reputation, what was my incentive to do the appropriate testing in this instance? I’m not really sure, but I think it is worth exploring how to positively reinforce correct behavior so that you do not rely on good intentions alone.

04.08.08

Code cleanup

Posted in tech at 11:37 pm by stacywong

I caught up with Dave Sifry the day after his keynote and asked where code refactor and cleanup came into the picture, since that’s something I’ve been wrestling with for a while. He took it from an optimization standpoint and said that it only makes sense to optimize the code that is most used. He also said that it’s hard to make a case for code cleanup unless it’s measurable, so you need to make sure that you write your software with metrics in mind so that you can use them to drive improvements.

True story. So now… how do I measure code maintainability? One of the things I see a lot is that if something works, we invoke the “if it ain’t broke don’t fix it” rule and never try to rethink it. What happens then is that surrounding code gets refactored and redesigned, so now you have the old stuff which is still functional, but it’s incongruous with the new stuff. You let this happen a few times, and all of a sudden you’ve got a huge mess because you’ve got several codepaths which all employ a different pattern and behave ever-so-slightly differently. This really bites when you have to debug something that requires tracing through your code jungle. Yes a debugger helps, but that is no excuse for unreadable code.

I believe in allocating time for code cleanup with every feature you write, but there’s a fine balance between cleanup work that is actually for the greater good, and cleaning up for the sake of cleaning up. How do you decide what is “good” cleanup work, and how do you measure its success?

ICWSM 2008, part 2

Posted in tech tagged at 11:36 pm by stacywong

I know it’s been over a week, but FWIW here’s a post about my ICWSM highlights.

Being a polyglot

One of the things I really enjoyed was research which spanned multiple languages. It’s fascinating because so much of understanding language is cultural and contextual, and the task of performing objective analysis across multiple languages is extremely difficult.

Nairan presented on word use in depression forums and contrasted between English and Spanish. She found that depression-speak in both languages tended to revolve around the self — “I”, “me”, “mine”. However, the words used by English speakers had to do with recovery, such as “medication”, whereas the themes in Spanish forums had to do with the causes for their depression, e.g. “family”, “boyfriend”, “school”.

There was a poster on cross-lingual blog analysis* which used wikipedia to link topics from one language to another. The results reflected cultural stereotypes, for example, Japanese blogs on whaling were laudatory and nationalistic whereas American blogs were anti-whaling.

The internet, it’s alive!

There was a great preso on Wikipedia and self-governance which really made me realize how all social media sites have grown so quickly and organically in the past few years.

There were also several other presentations on sentiment analysis of blogs, which were very cool. Papers are posted on the ICWSM blog. When I told her about this, Lisa reminded me of wefeelfine which is a pretty way to see how the internet is feeling.

Pretty pictures

I’m not really a graphics person, but it was cool to see Marc Smith’s work on Picturing Usenet (jump to the analysis section).

I got to play around with E15:FB (3D visualization of facebook). E15 is a web visualization tool developed at MIT Media Labs. You can see some videos of E15:FB on Takashi’s site.

Scaling Innovation

Here’s a good transcript of Dave Sifry’s talk. Even though he’s preaching to the choir, I still enjoyed hearing his take on different engineering tradeoffs… and the gratuitous mention of 2-pizza teams. :)

If it wasn’t already apparent, I had an awesome time at ICWSM. Spotaneous conversations about privacy and politics, a crash course in NLP and mining the blogosphere… I definitely learned a lot and would love to go back again next year. +1 for it being in Seattle again.


* I can’t remember the context for this, but there was a step which required manual translation. I saw that “チョコ ウェハー” didn’t have a translation at all and I was like, “Oh, that’s a Kit Kat!” Hiroyuki explained that even though the translation was manual, they used dictionaries so that translations would be consistent… so no Kit Kat. I realized that it’s pretty arbitrary choice since it could just have easily been translated to Time Out or Loacker biscuits. Translation is hard because sometimes you’re trying to reconcile your own experience with what you think others will understand; i.e. everyone else’s experience.

03.31.08

ICWSM 2008

Posted in tech tagged at 11:28 pm by stacywong

Today was my first day attending ICWSM 2008. It was a really refreshing change of environment for me, where I could on learn for the sake of learning without wondering if I was being productive.

Needless to say, I was psyched about Brad Fitzgerald’s keynote where he waxed philosophy on opening up the social graph with technologies such as OpenID, OAuth, XFN, etc. Yes! Both educational *and* job relevant. However, Brad didn’t address the topics which I would’ve liked to see; specifically with respect to persona management. Even in the Q&A he glossed over this topic… While acknowledging that “persona bleeding and management is a problem” he didn’t offer any ideas on how we should begin to solve this problem beyond “It’s up to you to manage your persona”.

Seriously? People can barely micromanage data for a single persona; can we really expect the average user to understand the implications of having multiple personas, and then manage them effectively? Or is it the case that the average user doesn’t want multiple personas? I don’t actually believe this latter statement is true. If anything, people are used to the current state of affairs, where each website is implicitly a separate persona.

During our lunchtime conversation, Kathy Gill took issue with the use of the word “persona” to describe different online identities, or different facets of your personality since it implies that the personality you are displaying is assumed, or fake. I still like it since I seem to lack any normal association with the word. But there is something here. I really think that we lack the appropriate jargon to talk about this particular problem. I’ve read a bunch about data portability and Identity 2.0, but I haven’t seen the problem of persona management being addressed much. Do we even know the multitude of ways people are separating their online identities? I had a tiny example a while back, but I’m sure there are many more use cases out there. How do we design protocols such that we can support persona management in a distributed and scalable way, but still make it easy to use and hard for users to do the wrong thing?

The other thing I wanted Brad to cover was why it would make sense for the internet heavy hitters to pick up something like OpenID (as a consumer, not just a provider). As a consumer and a geek, I love the idea of opening up the social graph. But large companies need to be driven by more factors than “it’s the cool thing to do”.

Anyway, there were a score of really interesting talks today, but I’m not sure when I’ll get around to blogging about them. I will say that one of my favorites was this paper on “impression agreement” based on online profiles — i.e. what parts of my profile will result in you perceiving me in the same way I perceive myself? Cool stuff, congratulations on the best paper nomination!

02.07.08

vitamins, painkillers and viagra

Posted in tech tagged at 1:19 am by stacywong

I saw a presentation by Dick Hardt where he talked about the current state of affairs of online identity management. He gave an analogy of “vitamins, painkillers, and viagra” — and here I grossly paraphrase — vitamins are good for you, but you don’t really wanna take them; painkillers you take because you HAVE to take them cause you’re feeling the pain; viagra is something you take because you get really excited by it and it enables you to do new things. He postulates that websites are somewhere between vitamins and painkillers in terms of adopting new fangled technologies such as OpenId and Windows CardSpace.

Old school identity management was (is?) about identity silos. Data goes in, not much comes out. New hot identity management is about a unified identity, giving users the control to centrally manage who they are, and who they want to share their information with. Sounds great from a user perspective, but how do Google and Yahoo and Ebay and whoever else get to benefit from this? It’s unclear; which is why we’ve got lots of people jumping on the OpenId provider bandwagon but we don’t see a lot of the large websites accepting OpenId anytime soon.

While unified identity sounds great from a maintenance perspective (woohoo, I never have to type in my favorite CDs again!), I’ve definitely taken advantage of my splintered identity on the internet. For example, my professional linkedIn profile is drastically different from my (now defunct) college facebook profile, and I share the former with colleagues and the latter with casual friends and never the two shall meet. In an Identity 2.0 world, there is an implicit link between the two profiles because the data is funneled from some central source which is tied to some unique identifier (OpenId, email address, SSN… :) ) which is tied to a single person: me. However I would still want the ability to present myself in different ways for different contexts. I see the need for more privacy levels as well as ways to classify HOW each piece of my data is used if I am really going to micromanage my online identity in a consolidated fashion.

01.12.08

Facebook and data ownership

Posted in tech tagged at 12:33 am by stacywong

In the past couple weeks facebook has disabled the accounts of celebrity blogger Robert Scoble and Raj Anand, technical director of kwiqq.

I don’t have much to say about whether facebook did the “right” thing… FWIW, Scoble broke his terms of service, and he knew it too, whereas Anand’s use of the site seems more innocuous. What these incidents highlight are larger questions of data ownership with respect to the purpose of facebook. Isn’t spamming your connections part of facebook’s charter? As a [former] facebook user I was always under the impression friending someone implicitly opened myself up to messages from them *even outside of facebook* because my contact information would get shared. For example, facebook’s Friend Finder can even spam non-facebook users via IM and email.

If facebook is allowed to use the contact information I put in there to spam my friends, and I’m allowed to use facebook as a means for mass communication, why am I not allowed to take that contact information and contact my friends via a non-facebook channel? It’s disingenuous to ask users to import data without having a way to export it in a meaningful way.