Thursday, May 2, 2024

How Language Evolves Over Time: Especially in the Marketplace

Behind all of the buzz about NLP, is Language. We wouldn’t have text-based communications, intermediated by machines, without two people using a common method of transferring ideas to one another. How did all of this arise? We examined in How Language Works in a Nutshell, the surface issues of social contracts, marketplace dynamics, and tribal interactions. In this post, I’ll take a deeper look at shifts in language, the micro and macro forces at play, how to adjust and account for those elements. Some refer to this as Evolutionary Linguistics, which relies heavily on biology and comes across as rather Darwinian. While this approach is based on 19th century understanding of language, it is biased. For example, the authors of this entry state that there is no archaeological trace of early human language. This is false. Much of the research can be found in Sociology and Anthropology studies. In addition, there is a new and emerging trend in Linguistics, where Archaeo-linguistics seeks to combine archaeology and linguistics into a blended greenfield approach.

An interesting book from 2007 by anthropologist D W Anthony, The Horse, the Wheel, and Language, presents primarily archaeological findings about the Kurgan hypothesis, and takes the position that language arises and evolves in parallel with technical innovations. To wit, the inventions surrounding the domestication of horses on the Sarmatian plains north of the Caspian Sea. (This area is approximately the equivalent of modern-day Ukraine.) Combined with the invention of the wheel, to create a mobilized society in the Bronze Age, this theory of evolutionary linguistics takes on the origins of Proto-Indo-European (PIE) by means of archaeology and evolutionary biology, specifically spending a greater portion of the book examining middens and pottery shards.

Imagine if you will a bronze-age innovator who decides to stop eating horses and instead domesticates one of them. This enterprising individual inserts a stick into the mouth of the horse, and puts two ropes on the ends, to make the horse follow his guidance. That eventually leads to the invention of a durable bit. Soon you have a set of traces connecting a pallet of your worldly possessions to the sides of the horse, so it pulls your goods to and from the winter camps. You no longer need to pull the sledge yourself. Now someone else comes along and invents the wheel, turning that flat pallet dragging along the ground into a cart that moves faster. Imagine the advantages you have over your neighbors. The horse is doing the work of men.

There are several inventions here to take note of. One technology builds on another. First, the idea of a pallet to stack food and possessions on, instead of carrying it on your back or in your arms. Then the horse or the wheel to make the transportation of items faster, easier. Now instead of a cart, a chariot for war. And so it goes, as history attests.

But more importantly to our story of language evolution, our clever inventor decides to trade copies of his bit, wheel, and other improvements to his fellow tribesmen. We now have business that exchanges goods for technology innovations. Perhaps a few sacks of grain for this new thing they decide to call a bit. Or two cattle for a set of those so-called wheels. As new inventions emerge, words are created to label that thing over there versus this thing here. Pointing just doesn’t suffice.

Now imagine if you will, a neighboring tribe sees the increased mobility, the speed and advantages of fighting off of the horse’s back instead of on foot. One can easily see that there are two choices. The tribes will align as allies and exchange technology and goods as friends. Alternately, one of the tribes decides that might makes right, goes to war against the other tribe and whoever wins incorporates the other tribe as slaves into their society. The rise of wealth, the need to have a common medium of communication, the desire to safely buy and sell possessions and crops all lead to the rise of marketplaces. Common meeting ground where people must talk to each other in order to achieve the desired outcomes. Soon, instead of a barter system, a token is found to equate value to goods. And this is the rise of money. Whatever a society values the most becomes an easy medium of value exchange. Is it gold, beads, shells, or simply a piece of paper that promises there’s gold behind it somewhere in a bank.

The gods must have their tithes and the king his taxes: Not only to keep this social construct of the marketplace supported and protected, but also to maintain their primacy of power over the people who gather to exchange goods and ideas. It all needs a warrior class to guard against invading neighbors. Authority is always based on power and money. Following the money, the rise of a military state almost seems inevitable. Protection rackets are not just for the mafia.

One can easily discern the causal links between technology, commerce, and language development. Example: Google is a new noun and verb based on technology shifts.

Shifts in Language

Language, while a social cultural construct, is not a constant. Definitions change, words drop out of popularity and as we see, are subject to the forces of history. You only need to look at English, to know that a speaker of Old English would have no clue what today’s Queen’s English is conveying. Researchers refer to the concept of Language Shift as a large-scale phenomenon, where a population changes from using one language to another. But what are the forces that lead up to such a radical shift?

Realizing that the British Isles have been invaded and conquered many times by sundry Nordic groups from the far north, by the neighboring France (creating Anglo-Norman in the 11th century), it is self-evident that Old English, primarily a Germanic language would be endangered and die out. Indeed OE, or Anglo-Saxon was an invading culture, brought over in the mid-5th century. It replaced the native Celtic languages. The dynamics of language communities demand a certain amount of maintenance and care if the survival of a mother tongue is overcome historic circumstances. Survival of language is why dictionaries exist: to codify spelling, definition, etymology, and variants of words. Example: the Académie Française in the 17th century forcing language standards on publications, teaching institutions, and attempting to outlaw local dialects.

The progression of any type of speech within a new context is characterized by migration, infiltration, or diffusion. When a whole speech community moves to a new location, that group of people tend to cling to their language, halting change for a time. Think of Québécois French, where a colony tried to keep its connection to the old world by forcing the next generation to maintain 17th century colloquialisms in the transmission of language from the older generation. Then after that original set of colonists had died off, the language began to change again, borrowing from the surrounding native tribes, inventing new words for the discoveries they made in the conquest of the continent. A variant, or creole, is created for that community, causing a branching of the mother tongue in a new direction. Another New World example is of course American vs. British English. Or Brazilian Portuguese vs. European Portuguese. Mexican vs. Castilian vs. Andalusian and so forth. Spanish has numerous dialects due to Spain’s colonialization of many parts of the world in the 16th century and onward.

War (infiltration) is another factor. Forcing a conquered people to adopt the language and culture of the victors, a sort of cultural assimilation technique. Here a great example is the Russification efforts of Soviet era policies, where native language and songs were outlawed, people from Russia forcibly relocated to populate the territories (or encouraged them to settle there), schools banned from teaching literature and history that might glorify the original regime. This happened in Estonia, Latvia, Lithuania and other post WWII mid-European states like Ukraine. In reality, the policy of forced started under Tsar Alexander II in the 1860’s and even earlier in medieval times. It was most successful in Belarus.

Diffusion is the cultural spread of a language. Here, a more modern example is English spreading through pop culture such as movies, books, and the internet. Another example is the popularity of anime and manga helping to promote the learning of Japanese.

Micro and macro forces at play

The sociological forces discussed above constitute the obvious Macro influencers for language shift. What are the micro forces? Literacy is surely one of them. Borrowing terminology to expand lexically and grammatically, the individual’s choices leading to localized slang. It is through an individuals’ speech behavior that language is either maintained or lost in the family context; and hence in the broader society.

Slang

Trade slang is a particularly interesting case to examine. Dutch traders arrived in Indonesia in the late 16th century, they surely did not speak the local language. Stepping off the ships, to the locals, they must have appeared as aliens, unintelligible, and Oh So White. ‘Do we kill them? Do we approach with caution? Do we try to make first contact?’ So many conflicting thoughts must have gone through the minds of each side. ‘What have they got that we want?’ This scene plays out repeatedly throughout history.

The need to have common terms arises. The forces of the global marketplace win over the sword and/or spear. Of course, by the end of the relationship, the sword wins after all. The need by the Dutch to keep the British at bay, let alone the Spanish, would dictate having forts and closed ports to protect their monopoly. Soon it means taking advantage of and controlling the local population. A Dutch monopoly on export is paramount. But back to Pidgin, which usually evolves around the domains of trade and labor.

A jargon, or set of vocabular terms, that is extremely limited enables a basic form of communication between two incomprehensible language speakers. It is often accompanied by hand signals and gestures. Sometimes an imperfect grasp, but still some knowledge, of the other’s native vernacular is required. There is a double illusion created when for example, the French think they’re speaking an Indian language, and the natives believe they are speaking good French. The conversations result in slang developing.

A clear example is Russenorsk, which arose in Northern Norway and used by Russian merchants, Norweigian fishermen, and the like. The first historic instance is from a 1785 lawsuit, and the last example shows it being stamped out in WWI. It was a seasonal trade language for the summer months, and never established itself as a creole with native speakers. Another example is the Lingua Franca (Sabir) of the Middle Ages, which grew up post-Crusades and dominated commerce in the Mediterranean, Black, and Irish seas.

Trade slang exists today, most notably on the trading floors of major banks, where a specific vocabulary and shorthand grammar is used in combination with hand gestures.

Phonology

Strong arguments are made on either side of the aisle about phonology and the forced pronunciation rules governing ‘proper speech.’ How words are pronounced influences spelling via errors in orthography. Consider a few examples that are now accepted regional dialectic forms. Y’all instead of You for the second person plural. It started out as saying “you all” to indicate a group of people as a separation from “you” singular. Then at some point, the error becomes the new standard. ‘Can’t’ instead of ‘cannot’. ‘Thru’ instead of ‘through’. And a classic favorite ‘Halloween’ instead of ‘All Hallow’s Eve’. Now let’s look at a function shift from adjective to adverb. The standard adverb is “well” as in “I am doing well.” In the past twenty years or so it is fashionable to say “I’m doing good.” Or just “Good” as a response to “How are you?” Many will look at you strangely if you respond “Well” instead of “Good.” A proper English teacher of prior generations would not just cringe, but flunk any student who speaks thusly. (And sound pompous for doing so.)

Adjusting and Accounting for These Elements

As always, why should we care about these issues of language and grammar in the context of NLP? Firstly, language changes, and so models must change to reflect the current state of the culture. After all a model is just a reflection of the data it ingests. And each domain within an area of knowledge will have shifting patterns of language. There is “Banking English,” “Healthcare English,” “Legal English,” and “Academic English” to contend with, let alone “East Coast English” “West Coast English” “Street English” “Australian English” and so forth. Each one requires an understanding or at least an awareness of the culture that created it.

All of this variation leads to the “natural” part of NLP. The discipline is not trying to necessarily have a formal understanding of the rules, but rather a practical understanding of the usage. The challenge is to not just count nouns and frequency of words in a text. It’s to understand the interrelated parts of speech that cause meaning to arise from an interaction between two individuals. A much more complicated challenge than TFIDF (term frequency–inverse document frequency) or other statistical approaches. To truly perform NLP at a level that leads to meaning and intent, a data scientist must understand how language works. If practitioners truly love languages and want to understand, they must study pure linguistics as well as computational linguistics, the structure of speech as well as the measurement and tallying of speech.

S Bolding—Copyright © 2022 · Boldingbroke.com

No comments:

Post a Comment

Generative AI: Risks and Rewards

 Benefits of GenAI The landscape of general computing has changed significantly since the initial introduction of ChatGPT in November, 202...