I think I met MCJ via Fantomaster online. Or maybe I saw her involved in a discussion with Dave theGypsy -- I'm not sure anymore. Either way, quickly after becoming aware of her I started to follow her because what she has to share with us is of a different quality than the usual SEO c.... stuff we hear. Started to read her blog posts (tremendously informative).
She's smart. She knows the things you would want to -- and even though she's likely the most knowledgeable SEO in the field at the moment, she's as cool and hip about them as the next person.
Since a couple of weeks her popular TGIF post appears on SEO Scoop
Information retrieval, algorithms, patterns. The mind's eye sees a schoolboard filled to the very edges with formula's, the mad scientist continuing to scribble on the wall.....
Is any of this material accessible if you're not a mathlete?
How did I make the transition? Well I put my fears aside and started at the beginning. I learnt about a lot of things that appeared supremely complex, that I had never come across, that I barely understood, and many things that I actually had no idea about at all. Some things took me a couple of years to digest and fully comprehend. Writing equations was a steep learning curve, and coding proper languages (not web programming) was quite a challenge too. I discovered however that I was blessed with a knack for finding creative solutions and that my linguistics background gave me an edge as did philosophy.
Edison said Success is 10 percent inspiration and 90 percent perspiration. - unfortunately there's no way around that. If you want to be a computer scientist, you have to have passion and not be afraid of hard graft. You can grasp the basics though, and they help for things like SEO and that's usually enough, but if you're talking a full in depth understanding, it takes time.
Dividing search into information retrieval and ranking algorithms it seems one is very basic (collect information, spit it back out) and the other is hidden (who knows *what* they're using to rank!).
What can I learn from which part?
IR understanding tells you all about how to work with copy, and how computers process it and make sense of it. It helps you work out how they go about picking particular documents above others. This information is not secret. In fact the science community is very open so you can easily find all of the papers and methods that you need.
Ranking algorithms are very complex and the topic around those is called "Learning to rank" because we use machine learning algorithms for that. In order to rank anything at all you have to create some kind of scale and then place everything in the right position. Humans can't really do this. The data has so many dimensions that it needs to be processed and analysed and taken apart with complex maths to establish any kind of ranking order. Do you always agree with the Google ranking? I don't and their system is very efficient in comparison to a lot of other ones. It still doesn't work to the level required though.
Here you learn about how machines go about deciding how to sort content. This is also the area of classification and clustering. Again all of the information on the shiny new methods are available.
Remember IR is a hammer and every problem is a nail. Things like machine translation are scalpels and microscopes.
New ranking algos nobody has talked about yet
Also issues with IR evaluation
When we hear "natural language processing" our mind conjures up the image of a captain aboard the deck of a space ship, talking to a machine and receiving intelligent responses. Or we imagine typing a question into a search engine and getting an answer back that is not "keywords on page" based.
What do you see when you think about NLP?
What should we be thinking about when Google states its working on AI projects?
Why the emphasis on personalization? What are they trying to personalize -- and what does "personalization" mean anyway?
A lot of people worry about giving information out to the engines, and privacy and such things. Online you are lucky to have any privacy for a start. Google yourself and there's your proof. Data from a single source is not interesting because it doesn't tell you anything about your performance as a whole, where the issues with your system are, what queries are common to which demographics...that is the sort of thing you want to look at. Who cares if Trevor Smith has looked at seashells from Papua New Guinea? We might care however if 2,000 people interested in the same things as Trevor looked at the same thing. I welcome the time when I can have more personalised results because they will save me time for one thing.
There is a common sense element to expecting search engines to somehow make sense and use of the tremendous amount of information available in social networking data.
Social networking data is something I've been looking at along with a bunch of other computer scientists. Having it incorporated into a search engine can lead to quite noisy data. The quality of the stuff that comes through twitter is usually not great anyway in my experience, and so I don't think it belongs in the "normal" rankings. Certainly there should be a way of getting through it all and finding things threads that you find interesting. The information on Twitter for example is great because it's short, but it's not easy to work out who is an authority source, find a full conversation instead of bits, and processing full natural language isn't the easiest, especially not in real time. An interesting area of research is sentiment extraction, something which Chris Rines is working on - watch this space. I'm not sure social media stuff belongs in a regular search engine. If it is to be included there are some issues that need to be addressed first.
Google proudly boasts taking into account over 200 factors when ranking results. Meanwhile pragmatical SEO's think "yup, and a good
The 200 factors may range from things for organising data to extracting small variables from it for example. I don't know what those 200 factors are but as an SEO professional, they're not very interesting. As a computer scientist they are. While I am on a crusade to educate and share scientific and technical information on how search engines work (along with David Harry to name but one), I do believe that SEO's do not need to know how to build and run a neural network for example. Knowing what one is is important but that's about it. The reason for this is that it gives some kind of understanding for how search engines function. This knowledge enables people to understand what new fangled algorithms are about and how likely the story is. Changing your whole SEO strategy based on what someone said in a blog post is dangerous and unnecessary. Read around it and do some tests.
My favourite quote is "If you want to make an apple pie from scratch, you must first create the universe" by Carl Sagan. If you want to understand something, you have to learn everything around it, and put all of your beliefs into question. For example if you don't know what a neural network is, how will you understand a paper that describes a method where one is used?
It's not easy for everyone to rank. Some sites are up against some interesting issues, such as for example an author who sells their books on their own site. Amazon and numerous bookshops also sell the book online. The author wants to show #1 for their own work and name obviously and sometimes this can be a challenge. Other sites are much more straightforward and yes, ranking can be relatively easy. I would say that it does depend in what topic area you're going for as well. "Sea shells from Papua new Guinea" might indeed be quite easy. This is probably because there isn't much data in that topic area. There will be in the "hotels" category though. This is a simplistic view of it but you get the picture.
Actually, if it was really easy to rank, it would actually get harder and harder. This is because not everyone can be at #1.
The single most effective thing to do for your web site is .... ?
- Ruud Questions: Chris Brogan
- Ruud Questions: Jill Whalen
- Ruud Questions: Dave Harry aka the Gypsy
- Ruud Questions: Barry Welford
- Ruud Questions: Alexander van Elsas
- Ruud Questions: Brian Wallace
- Ruud Questions: Garrett Pierson
- Ruud Questions: Marty Weintraub aka aimClear
- Ruud Questions: Kim Krause Berg
- Ruud Questions: Angie Haggstrom
- Ruud Questions: Shana Albert
- Ruud Questions: Steve Gradman
- Ruud Questions: Rae Hoffman aka Sugarrae
- Ruud Questions: Joost de Valk
- Ruud Questions: Debra Mastaler
- Ruud Questions: Mike Grehan
- Ruud Questions: Bryan Eisenberg
- Ruud Questions: Ralph Tegtmeier aka Fantomaster
- Ruud Questions: Marie-Claire Jenkins
- Ruud Questions: Cindy Krum
- Ruud Questions: Steve Plunkett on Google Is Our Friend
- Ruud Questions: Brian Carter
- Ruud Questions: Tamar Weinberg
- Ruud Questions: Hugo Guzman
- Ruud Questions: Dr. Mihaela Vorvoreanu
- Ruud Questions: Matt McGee
- Ruud Questions: Michael Gray a.k.a. Graywolf
- Ruud Questions: Christina Gleason
- Ruud Questions: Michelle Corsano
- Ruud Questions: Glen Allsopp aka ViperChill
- Ruud Questions: Joanna Lord
- Ruud Questions: Kristy Bolsinger (RealNetworks)
- Ruud Questions: Julie Joyce
- Ruud Questions: Carol Skyring
- Ruud Questions: Henk van Ess
- Ruud Questions: Anna Gonzalez (from News 8 Austin)
- Ruud Questions: Hugh Macleod aka Gapingvoid
- Ruud Questions: Tadeusz Szewczyk aka Tad Chef aka Onreact
- Ruud Questions: Arnie Kuenn
- Ruud Questions: Richard Hamilton (from XML Press)
- Ruud Questions: Steve Rubel
- Ruud Questions: David Allen
- Ruud Questions: Aaron Wall
- Ruud Questions: Stephan Miller
- Ruud Questions: Meg Geddes aka Netmeg
- Ruud Questions: Ed Bennett
- Ruud Questions: Gab Goldenberg