Thursday, June 30, 2011

Use Graph to Describe Multiple Regression Equation

Multiple regression is one of the most widely used model in research (1) to assess the relationship between a group of independent variables with one dependent variable, (2) to evaluate the relationship between one independent variable and an dependent variable controlling for covariates, and (3) to predict an outcome variable (dependent) using a group of predictors (independent).  Although regression model can be grasped using the linear equations we learned in high school:

In this model, x1, x2, x3 ... xare a group of n independent variables, and y is the dependent variable. The regression relationship between the first three independent variables and the dependent variable y can be described graphically as below:
This graph directly tells us that the three x's independently affect y, with each of their effects being expressed using the b regression coefficients.  These coefficients are similar to the coefficients used in structural equation modeling (or path model) when no latent factors are included (refer to my blog on structural equation modeling in this thread.)   

Tuesday, June 28, 2011

Three Keys for Successful Publishing

Working as a faculty at a university, the biggest challenge, also the greatest happiness is to write papers for publishing in peer reviewed journals, the phrase "publishing or perishing' very well characterized the profession. How can a person publish all the time, year by year, month by month and day by day? In general, professors in the United States publish 3 to 8 papers an year, depending on professional field. Professors conducting survey studies often publish more than professors but few papers doing laboratory-based research. Professors with a lot of PhD students may publish more (but fewer first-authored) than professors with a few students (but more first-authored). To publish successfully, the following three points cannot be missed.
1. New Ideas: A successful publisher always has new ideas, new thoughts, new hypotheses, and new stories. It is from an new idea a paper grows out. To my knowledge, no new idea, no good paper. Coping other people’s idea could not be able to produce valuable papers. If you have no new ideas or feel very hard to come up with new ideas, or feeling run out of ideas, this would be a very bad sign.
2. Organized presentation: One point often neglected by many scholars is how to organize the writing well so that readers can easy to follow, being pursued to accept your idea, and finally agree with what you trying to say. Many people are used to "free talk", which will make it very hard for the reviewers to give you good comments, and to recommend to the editor for publishing your work. To write a good paper, we have to organize the contents at three levels - (a) the overall, (b) within each section and (3) with each paragraph.
3. Evidence support: The last and most important is that whatever you stated in a paper, must be supported by data. In addition to data from reported studies by others, as we often used in writing the introduction, discussion and conclusions, we write a paper to report the evidence we identified to support our ideas, our hypothesis, and to make our beliefs a truth.

Saturday, June 18, 2011

The Magic Number P less than 0.05

P<0.05 is a number sccientists are looking for no matter they are reading an article by others or conduct a research of their own.  When p<0.05 appeared either at the bottom of a table or in a chart, we know that a conclusion is there; otherwise, futher research is needed. What makes this tiny number so important?
In research, we work on a sample to see the population.  As indicated in another blog, 95% confidence interval will be used if our interest is in a single character, such as how many people support the president (click here to read more about that). However, if we want to compare two or more groups, or want to test if two things are related with each other, p value is commonly accepted as the standard to judge if an observed difference between two groups or a correlation/regression coefficient is true or a result that can be generated simply due to random samppling. The p value indicates the likelihood or the chance to get the observed results purely due to randomly sampling (also termed as sampling error).
Why 5%? I have not read any objective assessment.  One thing for sure is that this 5% is complementary to 95%, the confidence interval, each telling the same story from different angle for different situations.  As has been described in other places, the digit 5 in numerology represent the majority.  If the chance to obtain the observed result is less than majority, this means to get the research result by chance is rare (opposite of majority) or very unlikely. Therefore, the observed result is true. However, we have to remember that this does not mean we are absolutely right when we draw this conclusion, becuase we have up to 5% chances to make mistakes!

Friday, June 17, 2011

95% Confidence Interval - From Poll to Research

95% Confidence Interval or CI sounds complex but we see it in our daily life.  When you hear from TV or ratio says that the recent poll done by xxx indicated that if we vote today, 55% will support President Obama, and the survey has a ±3% error.  This means that based on the poll results, the percentage of people who vote for Obama may vary from 52-58%.  The range 52-58% is statistically called 95% CI.  In fact, the poll was conducted among a group of randomly selected respondents, or random sample. The rate of 55% was computed based on reported data collected among the sampled respondents.  This random sample may represent the total population to a great extent, but it does not equal the total population. When the sampled results is used to reflect the total population, statistical method can help us to determine what would be the possible range of all people in the total population who support the president – the 95% Confidence Interval.  In common language, we say that according to the poll, 52% to 58% of people support the President.  In statistical term, we say that the 95% range means if we use this range as a scale to assess the percentage of people who support the president for 100 times, 95 times we will cover the true supporting rate of the total population.  Incidentally, the digits 9 and 5 in the 95% CI are really interesting (clinck here to see another blog).  9 and 5 are the basic characteristics of a king or a president, 5 represents middle class, 9 represent highest power.  Also, as a president, we cannot expect him/her to be perfect or 100% great, 95% would be adequate!
Come back to research, almost all the work scientists are doing is to try to study a sample and use the sample to gain insight into the whole picture of the subjects they are interested in.  Therefore, 95% CI is very important.  Since 2010, the American Medical Association imposed that all research papers include 95% CI when reporting their analytic results.

Thursday, June 16, 2011

Developmental Trajectory Analysis - A New Trend

Recently, the developmental trajectory analysis gains a momentum in scientific research. This method was originally used by scientists in social and behavioral research fields to examine trajectories of people who commit crimes. Pretty soon, this methodology has been spread to the medical and public health field to study developmental trajectories of many health related issues, such as developmental trajectories of overweight and obese; developmental trajectories of substance use, such as tobacco smoking and alcohol drinking; developmental trajectories of sexual risk behaviors.
The advantage of this methodology over others is that it can test the hypothesis assuming a non-homogenous distribution of the study population.  For example, when studying body growth for population in a country, we simply compute the mean (standard deviation) of the people by age, and use it to describe the growth process.  This approach is implicitly assume that body growth follows a normal distribution (homogenous), which may not be true.  With this approach, subgroups with totally different growth trajectories combined together, and could not be identified.  Data from a growing number of studies also support the advantages of this method over others, including several studies I have conducted recently. One is related to the assessment of a cluster randomized controlled trial to assess a youth behavioral prevention program among pre-teens in the Bahamas, and another involves the evaluation of a Motivational Interviewing Therapy to enhance condom use among HIV positive youth in the United States.If you want to get help, please visit the two links first. Contact me if you need further assistance (

Statistics Was Originated as Political Arithmetic

By the 18th century, the term statistics designated as the systematic collection of demographic and economic data for planning and governing by states. As a result, statistics once was known as political arithmetic.  Since the early 19th century, statistics has gradually been introduced to research. Today, statistics has been used almost in all aspect of human life, particularly the use in research.  The development of statistics as a branch of science has attributed to two important inventions: Probability Theory and Computer Science.
Today statistics become an essential and necessary tool for many researchers to investigate sophisticated research questions, to uncover myths, and to generate new knowledge.  However, many people aournd us will simply turn down when they hear the word “statistics”.  I will use this blog as a small stage to share with anyone from those who are simply curious about it to those who are using it in their daily work. My blog will be particularly relevant to those who are working in the field of public health, psychology, behavioral research. Statistics is simple.  It is around us in our daily life.  It is not as scary as it sounds!