cougarguard.com — unofficial BYU Cougars / LDS sports, football, basketball forum and message board

cougarguard.com — unofficial BYU Cougars / LDS sports, football, basketball forum and message board (http://www.cougarguard.com/forum/index.php)
-   Religious Studies (http://www.cougarguard.com/forum/forumdisplay.php?f=35)
-   -   Stylometric Analysis of Scripture (http://www.cougarguard.com/forum/showthread.php?t=10963)

Solon 08-18-2007 05:31 PM

Stylometric Analysis of Scripture
 
Can any of you stats folks figure this out?

http://links.jstor.org/sici?sici=096...3E2.0.CO%3B2-Z

I could only handle the introduction and conclusions.

If you would like a pdf of the entire article and don't have access to JSTOR, send me a Private Message with an e-mail address. Just make sure you observe "fair use" guidelines.

pelagius 08-18-2007 06:43 PM

Quote:

Originally Posted by Solon (Post 113833)
Can any of you stats folks figure this out?

http://links.jstor.org/sici?sici=096...3E2.0.CO%3B2-Z

I could only handle the introduction and conclusions.

If you would like a pdf of the entire article and don't have access to JSTOR, send me a Private Message with an e-mail address. Just make sure you observe "fair use" guidelines.

Solon, do you have specific questions? I don't do stylometrics in my professional work but I am generally familiar with it. I read this paper once a long time ago and I don't keep up with the developments in the literature. I have little doubt that BYU or FARMS responded to this study, but I am unaware of the response.

If you are looking for a kind of summary of what he finds then look at figure 1. Can you see how the same author samples don't cluster together (for example, look at the Mormon samples)? The often cluster closer to other authors (except for the Joseph Smith writing samples).I think that may be the most important point he makes. There appears to be a fair amount of variation within author (as identified by the Book of Mormon). The within author variation looks at least as big as between author variation. (I think that study uses measures of vocabulary richness as compared to the early BYU stuff that looked at the frequency of non-contextual words). In summary, the original stuff said that Mormon and Nephi don't write the same, and basically this study says that Mormon doesn't write the same Mormon and the difference is as big as the Mormon/Nephi difference.

Implications

I think Mormons should be wary of relying on or turning to stylometric results for support of multiple authorship. At best the empirical evidence in favor of the result isn't robust, and probably should be described as mixed.

Second, I can't for the life of me figure out why even if it is an ancient document one would expect there to be evidence of multiple authorship given what we know of the translation process . Also, I think you can construct reasonable hypotheses where it is a 19th century document and multiple authorship. I just don't see how a sharp hypothesis with regard to multiple authorship can be generated (either direction).

Solon 08-18-2007 06:50 PM

Quote:

Originally Posted by pelagius (Post 113839)
Solon, do you have specific questions? I don't do stylometrics in my professional work but I am generally familiar with it. I read this paper once a long time ago and I don't keep up with the developments in the literature. I am sure that BYU/FARMS responded to this study, but I am unaware of the response.

If you are looking for a kind of summary of what he finds then look at figure 1. Can you see how the same author samples don't cluster together (for example, look at the Mormon samples)? The often cluster closer to other authors (except for the Joseph Smith writing samples) I think that may be the most important point he makes. There appears to be a fair amount of variation within author (as identified by the Book of Mormon). The within author variation looks at least as big as between author variation. (I think that study uses measures of vocabulary richness as compared to the early BYU stuff that looked at the frequency of non-contextual words).

Implications

I think Mormons should be wary of relying on or turning to stylometric results for support of multiple authorship. At best the empirical evidence in favor of the result isn't robust, and probably should be described as mixed.

Second, I can't for the life of me figure out why even if it is an ancient document one would expect there to be evidence of multiple authorship given what we know of the translation process. Also, I think you can construct reasonable hypotheses were it is a fraud and multiple authorship.

I don't have any specific questions - just stumbled across it looking for something else and was wondering if anyone knew about the stats involved. I'm reluctant to give much credence to measuring something like style, but then again, what do I know?

I've come across stylometry in Classics with people trying to prove/disprove Aeschylean authorship of Prometheus Bound, but that was small potatoes compared to the algorithms in this article.

I once heard Rick Majerus say something along the lines of, "Statistics are like bathing suits: they reveal a lot but conceal the most important parts."

Is anyone familiar with this line of research?

Archaea 08-18-2007 06:58 PM

That study is just a mid study and Pelagius has stated the general conclusions about stylometrics, namely they are interesting but are so far removed from proving much that emphasis on them has been mostly abandoned.

Another coined phrase for them is "wordprint", but stylometrics is the academic term. In the end, the style of this study and others could be considered, "Much Ado About Nothing."

ChinoCoug 08-18-2007 07:12 PM

I should be able to access this article from work on Monday. I'll look into it.

pelagius 08-18-2007 07:31 PM

Quote:

Originally Posted by Solon (Post 113840)
I once heard Rick Majerus say something along the lines of, "Statistics are like bathing suits: they reveal a lot but conceal the most important parts."

Is anyone familiar with this line of research?

I guess I am still unsure of what you are looking for, Solon (I promise I am not trying to give you a hard time about this. I hope it doesn't come across that way. I am honestly never quite sure what level of detail want when they ask statistical questions). Did you want me to explain the methodology used in the paper? If you want then I can do that (I would be surprised if you really want me to). For example, one of his core measures is the Sichel Distribution which measures the probability that a word (he only uses nouns) appear X times in a N word sample (N=1000 in the paper). The distribution is a two parameter distribution: alpha and theta (they jointly describe the shape of the distribution the way mean and standard deviation do for the normal distribution). The author doesn't specify but it makes most sense to estimate the distribution using Maximum likelihood. You then can compare the different writing samples by testing if the alpha's and theta's are different across the writing samples.

He then combines this measure with 4 other similar measures and then explores commonality using two different approaches: cluster analysis and principal component analysis.

Are you asking for a comment on whether this approach is subjective?

There some truth to that charge. The stylometrician has a fair amount of degrees of freedom in terms of the design of the test.

Solon 08-19-2007 02:15 PM

Quote:

Originally Posted by pelagius (Post 113845)
I guess I am still unsure of what you are looking for, Solon (I promise I am not trying to give you a hard time about this. I hope it doesn't come across that way. I am honestly never quite sure what level of detail want when they ask statistical questions). Did you want me to explain the methodology used in the paper? If you want then I can do that (I would be surprised if you really want me to). For example, one of his core measures is the Sichel Distribution which measures the probability that a word (he only uses nouns) appear X times in a N word sample (N=1000 in the paper). The distribution is a two parameter distribution: alpha and theta (they jointly describe the shape of the distribution the way mean and standard deviation do for the normal distribution). The author doesn't specify but it makes most sense to estimate the distribution using Maximum likelihood. You then can compare the different writing samples by testing if the alpha's and theta's are different across the writing samples.

He then combines this measure with 4 other similar measures and then explores commonality using two different approaches: cluster analysis and principal component analysis.

Are you asking for a comment on whether this approach is subjective?

There some truth to that charge. The stylometrician has a fair amount of degrees of freedom in terms of the design of the test.

No, you're not giving me a hard time, and thanks for all the details you provide. I have nothing other than passing curiosity, and wondered if anyone was familiar with this type of endeavor and what he/she thought of it. Has stylometry been accepted in other fields, or is this just a statistical game? I'm not looking for hardcore answers, just curious about the size of the iceberg beneath this tip.

Archaea 08-19-2007 03:47 PM

Quote:

Originally Posted by Solon (Post 113893)
No, you're not giving me a hard time, and thanks for all the details you provide. I have nothing other than passing curiosity, and wondered if anyone was familiar with this type of endeavor and what he/she thought of it. Has stylometry been accepted in other fields, or is this just a statistical game? I'm not looking for hardcore answers, just curious about the size of the iceberg beneath this tip.

I believe it's been attempted in literature but not with much acclaim or success.

pelagius 08-19-2007 04:51 PM

Quote:

Originally Posted by Solon (Post 113893)
No, you're not giving me a hard time, and thanks for all the details you provide. I have nothing other than passing curiosity, and wondered if anyone was familiar with this type of endeavor and what he/she thought of it. Has stylometry been accepted in other fields, or is this just a statistical game? I'm not looking for hardcore answers, just curious about the size of the iceberg beneath this tip.

I think Arch has it about right. I have seen it used in various disciplines. For example, you see it pop up in the 70s in Biblical Studies in terms of trying to establish authorship of things like Isaiah. I think the study you link to underscores some of the problems with the measures that are used. Namely, how stable are these measure within author? If within author variation is large, then you can't make meaningful infererences in terms of identifying authors.

I don't know the literature well enough give a sense for the advantages or disadvantage of various measures in general. The article does talk about these issues a little bit. I will say that I don't like the original stuff by BYU that relied on non-contextual word use patterns ("and it came to pass") because it seems likely that non-contextual word use is affected by translator choice or preference.

However, I think stylometrics could be useful but I think one needs to have a pretty sharp hypothesis about how the different proposed authors wrote. I think in such a case the results could be quite compelling.


All times are GMT. The time now is 04:17 PM.

Powered by vBulletin® Version 3.8.2
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.