17 February 2010

Google It

"...(W)hen a student fails to flourish, it is rarely the result of one party. Rather, it tends to be a confluence of confounding factors, often involving parents, teachers, administrators, politicians, neighborhoods, and even the student himself. If we could collect data that allowed us to parse out these influences accurately, then we might be able to hold not just teachers but all parties responsible. At present, however, we are light-years away from even understanding how to collect such data."

I read that this morning, in Education Weak (no, I spelled that correctly). Then I read the following this evening in My "Comments" section after the post "No Merit Against Merit Pay":

"...However, we might think of one model that has applicability on such a large data set. That is the 'Google Page Rank' algorithm that was the foundation of the world's most successful search engine... successful for being able to have a high degree of accuracy in finding true correlations between "your search query" and "the info that matches your query". Stats geeks - you figure it out.

Now, if such a system that can correlate a student's achievements with earlier teacher contributions, it will be a much better way of finding out which teachers are successful at doing the job, i.e. preparing students to go on to the next level of academics successfully."

The brilliance of My Idea...okay, of The Insider's idea found on MY blog (I'm happy now) is that it has an answer for what teachers say is an "insoluble" problem. Now I'm with The Insider on the fact that I don't know how it can be done...but would I be The Jenius if I didn't give it a stab?

How well you know Me.

--What Google--the Owner of the Internets--did with search was postulate a method by which any question could be correlated to a growing set of "inter-related data points." In contrast, what We want with "data-driven, value-added assessment" is to answer a very specific paired set of questions: How is a student progressing? and Who's teaching helps students progress more? Every other question We may wish to ask about education simply flows from these two starting points.

--Given that framework, We can begin aggregating data already in hand, from socioeconomic factors to test results and health data. Yes, it presents an incomplete picture of factors affecting education, but it starts to establish the framework and helps identify gaps that need to be filled.

--In the aggregation of data and the application of the algorithm, there is a ranking system (Google's success in taking over the  world was predicated on this point.) By playing with the data and ranking systems, We can test different methods against historical results, as in determining how closely IQ tests are related to academic success or how much weight should standardized tests have. The strength of this exploration is that We can see the end results of factors in the recent past and weigh them accordingly. The weakness is that because more data needs to be added to fill out the evaluation framework, the algorithm will not be a reliable predictor of future success for maybe several years. 

--During this "historical" assessment, We will again see how important family, health (particularly nutrition), funding levels and socioeconomic levels are to a student's ultimate success. Fine: We can them "average them out," essentially place them in the background so that the information We want to see, i.e., teacher performance, can then "rise" above the data.

--Teachers, don't get hissy about the previous paragraph: it's called "filtering" and it applies as much to search results as it does to seismic events, cardiograms and musical recordings, and so it will apply to this kind of teacher evaluation system.

--Given the amount of data and the need to capture the intangibles to improve education, coupled with the notion that by doing so the U.S. of part of A. could once again vault to the head of the (world)class and lead the way into true 21st century schooling, I'd be very surprised if Google doesn't start doing this soon--or if they might already be planning to do so. They may even have a test project going on somewhere. It makes sense that the world's leader in data analysis simplification (think about it) would want to use that power to impact the very basis of a society's future: education.

And then again, all this may come together because I had a brilliant idea...

Of expanding on The Insider's comments.


The Jenius Has Spoken.


KW said...

If I can boil this analysis down, would it be fair to say that we just need more data to be available, so that an applicable ranking of teachers may result?

I hope so, because I have to warm you that comparison of the teacher ranking to Google is flawed. (Gasp) Only a geek like me would know that while in the end, yes Google does give you the "info that matches your query," the means that that they use approaches the problem from a perspective of authority.

You see when Sergey and Larry were Phd students at Stanford, they were working on a system to rank the authority of researchers and research papers. The strategy they used was the more a scientific paper was referenced by other papers, made the researcher (and the paper) more of an expert on a particular topic. In addition, the more authoritative a researcher is, makes the papers she references become more authoritative.

So what Google Page Rank does is identify the pages by order of authority on a subject. What they found is that links work just like scientific references.

A simple way to think of this is that links (or scientific citations) work like votes, and the more authority a source on a topic gives their votes more weight.

I hope that helps...

GCSchmidt said...

Of course this helps, MC. What Google did was take "user-based relevance," in essence "votes for or against" to rank websites. It is a weighted value system, where the "weight" comes from "votes." We could do the same with data related to the educational process provided We gather enough relevant data, assign it a "proper" weight and then filter out background elements so that what We want to see is revealed.

I don't think it's outside the realm of possibility in the next few years. What I like about The Insider's idea that I tried to claim for Myself was that We already had a model for sifting megaloads of data for specific queries. How it can specifically be done is yet to be determined, but We have a general methodology to use as starting point.

Thanks for dropping by. Courtesy pass for next Our showing can be picked up in the lobby during office hours.

The Insider said...

What I specifically like about the Google PageRank is the multi-pass, iterative improvement of the data accuracy over time. Now, I'm not a statistician by any stretch of the imagination, so I am likely using extremely low level layman's terms to discuss a common principle in data analysis.

With each iteration, the authority of the data improves. This is covered in the description given in the original document (which I link below).

When I mentioned this algorithm, I was thinking of how it could be used as a model for passing "measurements of influence" in various different aspects, many of which Gil has outlined above.

For example, if there is found to be a positive correlation with having a lunch program available at a school, any student experiencing 1 year of studies in such a school would have some predetermined value of "influence" passed along within an iterative refinement of the student's lifetime influence accumulation (both positive and negative).

With that data, we can make predictions about how they should perform at each year of their academic studies.

When students exceed or fail expectations predicted by the model, we can re-trace over large datasets to find commonalities that may have lead to the change.

The commonalities may be particular teacher effectiveness, or other issues (social economic factors) in their region, etc.

When we isolate to account only for teacher influence, we are then able to associate data with particular regions, schools, and individual teachers.

If the accuracy of that data can be verified, then we have a model for developing a national, state, and/or regional score sheet of teachers.

It leads to quantifiable observations such as:

We find that, all else being held equal, students of teacher A have been found to significantly under perform in academic years following instruction by teacher A, compared with students of other teachers.

The alternatives are to perform at average, or to significantly over perform.

We use this to assign a refined numerical score to teachers. And then this can be applied to any sort of merit/penalization system that is imposed. Or it can simply be used as a metric to determine the state of the collective educational body of the US. In the end, everything turns out to be a merit/penalty based system. I.e. even if focused training for teachers is the only outcome, it will either be considered as a reward (for those who appreciate the opportunity for improvement) or penalty (for the sub-standard and lazy who really should be fired).

Now, since I do not know when I may participate in such a discussion again, I want to put a few links out there into the ether. For anyone happening upon this discussion, who would like to dig deeper, perhaps some of these resources may act as your muse. ;)

Tim Berners-Lee on the next Web (linked data)

Google Fusion Tables:

"The Anatomy of a Search Engine" - Sergey Brin and Lawrence Page

Hans Rosling shows the best stats you've ever seen:

A start-up that might have some success in this area:
Background info on them:

Unleashing the data to the public, so innovation can happen:

(After public data is out) How to Get the Brilliant People of the World to Optimize the Algorithms. Learn from Netflix: