in Technology

Cracked – Algorithm for searching, sorting and displaying people

We had a challenging task ahead of us in Glancer – to develop the algorithm for searching, sorting and displaying people. What made this task challenging?

  1. Keyword based search, boolean (AND, OR, NOT) support
  2. The matches should be shown on the Match Map according to the extent they match with the profile of the current user. So if I searched for PHP, people who are interested in the fields that I am too, should be shown close to the center of the map. Or people from the same city. Or people in the same work area. As we go away from the center, the match percentage to my profile would decrease.
  3. The display should not be cluttered with all people in the same place. They should be spread out on the circle based on some specific criteria. How will we determine the pixel position of each person on the map?

I was toying with a few ideas for a few days and we did a brainstorming meeting on the topic today. Vijay, Avani and I discussed a whole range of solutions. From tagging people in groups based on their profile, to storing the match score for all to all in tables, to some kind of tree structure. We were also confused on how IntroNetworks does it.

Finally we managed to crack it! Here are the algorithms that we thought. If you have a better idea, post a comment!

Search
We would use full text search. If this gets too heavy, we would search on specific fields. The logic for boolean search etc is available from previous work that we have done and we would implement something similar. We are dropping advance search and search within results for now.

Match Score
The match score will be determined based on a number of factors. There will be default weight given to these parameters and the user will be able to change them. Tell us if these parameters are ok! Basically, we need to match these areas with the profile of the searcher. More the items in profile areas match, higher the score. The score in individual sections will be multiplied by the weightage the user has set for each. That will give us a final match score between two people.

  • Areas of interest: the tracks interested in. This is priority one. If there is somebody else who matches the keyword I searched for and also shares my track interests, we are more likely to be friends. This can have multiple values. More the matches, better the score.
  • Technologies working on: User profile would have a group of technologies that they work on. E.g. LAMP.
  • Work Area: People in similar levels would match better. Student to Student and CEO to CEO.
  • Location: If the location is same, we should give higher score since you can come to the event together! And even carry on a project / LUG activity together in your city.

The basic funda of weightage and scoring was not too difficult. But how shall we store this into the DB so that the queries are optimized? We would have thousands of profiles and search and sort is going to be the most common operation! If the sort algorithm requires a lot of queries, or processing the data in PHP, it could slow down things.

Alright, I won’t keep you hanging anymore! What we plan to do, is to use the Set field type in MySQL. The set field can store multiple values from a specific list of items in a single field. So we could have a set of checkboxes for the interest areas, user would select a few, and they would all be stored in a single set field type. Internally, MySQL stores 1 for the the set value inserted and 0 for the others. So the value for a field may look like “0111” in binary. We could now do a binary AND on the value for the current user and the ones that have matched the search query to get the common areas of interest. Then simply count the 1’s in that value and add them to the match score. Take a look at this excellent article which describes this method in detail.

We actually came to the set datatype from an idea to store the profile data as 0’s and 1’s. I was thrilled to see that MySQL implemented the Set data type exactly as what we were planning. And hence we would get really good performance, even if we calculated the match score at runtime. I was happier because we thought something afresh, without knowing the implementation of Set, and it was how MySQL guys thought too! 😉

Display of matches:
Now that we had the “distance” figured out, the only remaining part was the degree at which to plot the record. The two factors together will give us the spot on the circle to place a person. We were thinking about this and I thought let me just review IntroNetworks to see if they have any help in the FAQs. Fortunately, they had. And they did it – well, randomly! But not just that, they arranged people in alphabetical order. Clockwise from 9AM position in the circle. We had thought about dividing the cirlce in zones and arranging people based on some of the areas in the profile, but arranging on the basis of name was much simpler. So we decided we will do just that. Divide the circle in 20 parts (18 degrees each). And put people together based on the first character of their names.

So, what do you think?

Write a Comment

Comment