Pitarie: Picture Book Search with Interdisciplinary Approach
NTT Communication Science Laboratories has developed a search system called Pitarie that can find just the right picture book to match a child’s interests and developmental stage. Pitarie has been developed with an interdisciplinary approach; we incorporate knowledge of developmental psychology as well as the latest advances in the research fields of similarity search and natural language processing. The unique functions of Pitarie realized by the fusion of human science and information science are presented here along with the elemental technologies.
Keywords: picture book, search system, early childhood learning
Children’s language development is enhanced when their parents read picture books to them . Many believe that reading stories with moral value to children, whether their endings are happy or sad, fosters the emotional development of children. Parents can expect that their children’s language development and emotional education will be enhanced if they can find picture books that match their children’s interests and developmental stages.
2. Finding picture books on the Internet
Using an Internet search engine makes it possible to find picture books by searching under authors or titles. Book reviews on the Internet provide information about how people felt about particular books. One can also look up recent picture book sales rankings on sites specializing in picture books by specifying factors such as age and gender. In addition, these sites recommend picture books to customers who purchase books regularly based on collaborative filtering; that is, the websites find groups of customers with similar purchase records and notify each customer of the books they have not yet purchased but that have been purchased by many other members of the group. Since people with similar purchase records are likely to have many of the same interests, the recommended books are likely to match the regularly purchasing customers’ interests.
3. Ways of choosing picture books
Finding picture books on the Internet is common nowadays. However, a surprising result was obtained in a questionnaire we conducted at a large-scale picture book event called the Picture Book Museum held annually at the Fukuoka Asian Art Museum. We asked 770 parent-child pairs how they searched for or chose picture books, and only 188 pairs (24% of the total) answered that they had used the various methods available on the Internet. The most common answer, cited by 540 pairs (70% of the total), was that they had visited bookstores or libraries. That is, even though a great deal of information can be obtained from the Internet, many parents and children still choose the old-fashioned way of going to places where they were able to take a look at actual picture books.
4. Aim of Pitarie in searching picture books
We found that currently, it is not very common to find picture books by using the existing methods on the Internet. At the same time, it can also be hard to find picture books in bookstores or libraries that have a limited number of books in stock. We developed Pitarie with the aim of overcoming these problems and finding picture books to suit children’s interests and developmental stages. The system diagram of Pitarie is shown in Fig. 1.
5. Japanese picture book sentences—difficult for computers
In existing sites designed to help users find picture books, people typically search for book contents that are similar to those of another book they have read by classifying the book on the basis of its contents. This means that books can be categorized manually, but there are two problems; it is time consuming, and the categories are limited to pre-set categories. In contrast, Pitarie has a function to automatically analyze picture book sentences via natural language processing techniques for handling sentences. Note, however, that it is difficult to apply ordinary natural language processing techniques to picture book sentences. An interesting fact is that even though picture book sentences are easier for people to handle than those written for adults, they are harder for computers to handle.
This can be explained as follows. The first step in natural language processing of Japanese is morphological analysis, which involves dividing sentences into morphemes (words). Unlike written English, which is represented only by alphabetic characters (letters), written Japanese consists of various types of characters: hiragana, katakana, and kanji, that is, Chinese characters. Hiragana and katakana characters are phonograms, characters that represent sounds, and kanji characters are ideograms, or characters that represent meaning. Since there are far fewer variations of hiragana and katakana characters than there are kanji, and katakana is mainly used to represent foreign words, children learn hiragana first. As shown in Fig. 2, sentences written for adults usually combine a mixture of hiragana and kanji, and these characters serve as hints for morphological analysis. Kanji is especially helpful because it restricts the number of candidate morphemes. However, since picture book sentences are written almost entirely in hiragana, the characters cannot serve as hints, and morphological analysis becomes quite difficult.
Therefore, we have developed a morphological analyzer with high accuracy even for sentences written almost entirely in hiragana characters . The analyzer automatically constructs a dictionary and learning data according to the characteristics of picture books. Pitarie uses the results of this morphological analyzer to estimate the readability of each book and to search books based on their contents.
6. Finding picture books of interest to the reader
We focused on two items as features of picture books: first, vocabulary used in the book and its appearance frequency, and second, bibliographic information such as the author’s name. We considered that these were important in searching by focusing on the book’s content. For example, books in which vehicles play an important part tend to have a lot of frequently appearing words related to vehicles, and certain authors tend to show a marked preference for stories with happy endings.
Pitarie performs similarity search using the above-mentioned features. Conventional picture book retrieval systems find books in which the information exactly coincides with a given small amount of information such as keywords. In contrast, similarity search attempts to satisfy as many conditions as possible for a large amount of input information. Precise searches are better when one knows precisely what one is looking for, for example, “The picture book titled ‘Gon, the Little Fox’” or “Books written by Nankichi Niimi.” However, Pitarie works better when one does not know precisely what one is looking for. Pitarie makes it possible to search in ways that conventional systems find hard to handle, using inputs such as “I want to find a picture book that has a story line similar to that of my favorite picture book” and “I can’t remember the title, but I’m looking for a picture book in which a family went on an outing to the sea and someone almost drowned.”
The search results Pitarie produced for the input Guri and Gura’s Seaside Adventure (Rieko Nakagawa (author) and Yuriko Yamawaki (illustrator), Fukuinkan Shoten Publishers, 1976) are shown in Fig. 3. In this case, a large number of text features were used as search items: 195 distinct words appearing in Guri and Gura’s Seaside Adventure, their frequency, and five types of bibliographic information including the authors’ names. The output results, obtained from the 2012 picture books in the Pitarie database, showed the books that had the most in common with the 200 text features. They included similar books written by the same authors with many words in common, for example, sea, swim, and swimming tube. A lot of information was entered, so the output results were based on a large number of features. As a result, a wide variety of picture books were found, and the books are similar to each other in many ways.
Pitarie also has a picture book map function (Fig. 4) that enables it to intuitively understand the complex search results. Pitarie achieves fast search with its graph-based similarity search algorithm, which utilizes as an index a graph (network) that is constructed by connecting similar picture books. The picture book map function displays the subgraph of the graph-based index that is related to the search results . In the example shown in Fig. 4, the picture books are broadly divided into two groups that each consist of picture books sharing the same characteristics.
7. Finding picture books from the illustrations
The graph-based similarity algorithm is versatile since it is applicable to any data set where some kind of distance* is defined between the search target objects. Performing searches on the basis of similarities between the illustrations in books allows people to find picture books whose illustrations are similar to those in their favorite picture books. The results Pitarie produced in a search for picture books whose cover illustrations were similar to that of There Are No Ghosts (Keiko Sena, Poplar Publishing, 2009) are shown in Fig. 5.
Many kinds of features can be extracted from the illustrations. One example is color schemes. There are roughly two kinds of color schemes in picture books; a simple, solid color scheme and a gradational color scheme. We used an image feature to handle these schemes. The image feature we focused on was the distribution of colors in pictures (color histograms). The color histograms of solid-color pictures have sharp, pulse-like shapes, as shown in Fig. 6. In contrast, the color histograms of color pictures with considerable gradation take the form of gently curving slopes.
Another characteristic we adopted was the degree of detail in the illustrations. A technique called edge detection can be used to extract lines from pictures (Fig. 7). Complex pictures with a great amount of detail have a proportionately large number of lines, while the number is much smaller in simple pictures illustrated as cartoons. By observing how the lines are distributed in the picture area, we can determine whether the illustration is drawn in a general, overall manner, or one in which the focus is on a certain character or other object. In this way, using the multiple features of illustrations in combination with each other makesit possible to search for particular picture books or for other books whose illustrations show the same features or characteristics.
8. Finding picture books to match developmental stage
Some publishers include information about the target age the book is intended for in their picture books. This information provided by professionals is useful for choosing picture books for children at certain ages. However, an investigation of over 2000 books (primarily best sellers) revealed that less than half of picture books contained this information. Furthermore, much of the information is somewhat ambiguous or overly broad in scope, with statements such as “Recommended for infants” or “Recommended for children 1–3 years of age.” On top of this, choosing books to match the developmental stage solely on the basis of the recommended target age information may not be a particularly effective approach. This is because children learn new words very quickly, and so those approaching their 3rd birthday will generally have a much larger vocabulary than those who have just celebrated their 2nd, although both can be said to be two years old.
Therefore, we have developed a method for estimating the readability (target age) of picture books. It combines knowledge of developmental psychology and techniques used in natural language processing. In studying child language development from the viewpoint of developmental psychology, we are focusing in particular on clarifying the mechanism through which children acquire their vocabulary. In this we received the cooperation of over 3000 parent-child pairs, enabling us to compile data on what words the children learned and spoke at what age . The individual differences we found in vocabulary acquisition were not inconsiderable, but performing logistic regression with the obtained data enabled us to accurately determine the time frame in which words were learned and spoken for about half of the children surveyed.
In developing the method, we used the data obtained on vocabulary acquisition time as well as the sentence characteristics (such as length) publishers used to determine the age groups for which picture books were targeted . People using the method will be able to estimate the targeted age group even if the publisher has not provided it. They will also be able to rank picture books having the same target age information in order of reading ease, or find picture books similar to others they have read and enjoyed. By incorporating this method, Pitarie makes it possible for parents to find picture books to match their children’s developmental stage with greater accuracy than could be obtained before.
9. Future development
In this article, we described the picture book search system Pitarie that we developed to find picture books that best match children’s interests and developmental stages. Pitarie utilizes the latest methods in information science and incorporates knowledge of human science in searching for picture books. Its advanced functionality enables it to perform functions that conventional picture book retrieval systems cannot easily perform. These functions are as follows: searching for picture books that have the same topic or have a similar style of illustration as one’s favorite picture book; searching for picture books on the basis of one’s vague memory by typing in the rough story; and estimating the text readability of books whose target age groups are not specified by publishers.
In the future, we plan to conduct practical experiments with Pitarie in libraries. The experiments will enable us to grasp any existing problems via the information collected from many pairs of parents and children. They will also make it possible to further clarify the aims we are striving to achieve, so that we can develop a new system version with even better capabilities. We will also attempt to further improve the graph-based index similarity search function, the natural language processing function, and other functions that make up the Pitarie system.
Picture books shown in Fig. 3
Top row (from left):
Guri and Gura’s Seaside Adventure, Rieko Nakagawa (author) and Yuriko Yamawaki (illustrator), Fukuinkan Shoten Publishers, 1977 (Japanese edition).
Guri and Gura’s Magical Friend, Rieko Nakagawa (author) and Yuriko Yamawaki (illustrator), Fukuinkan Shoten Publishers, 1992 (Japanese edition).
Guri and Gura, Rieko Nakagawa (author) and Yuriko Omura (illustrator), Fukuinkan Shoten Publishers, 1967 (Japanese edition).
Guri and Gura’s Special Gift, Rieko Nakagawa (author) and Yuriko Yamawaki (illustrator), Fukuinkan Shoten Publishers, 2003 (Japanese edition).
Guri and Gura’s Picnic Adventure, Rieko Nakagawa (author) and Yuriko Yamawaki (illustrator), Fukuinkan Shoten Publishers, 1983 (Japanese edition).
Guri and Gura’s Surprise Visitor, Rieko Nakagawa (author) and Yuriko Yamawaki (illustrator), Fukuinkan Shoten Publishers, 1967 (Japanese edition).
Guri and Gura’s Spring Cleaning, Rieko Nakagawa (author) and Yuriko Yamawaki (illustrator), Fukuinkan Shoten Publishers, 2002 (Japanese edition).
Middle row (from left):
Konnichiwa Mata Otegami Desu, Rieko Nakagawa (author) and Yuriko Yamawaki (illustrator), Fukuinkan Shoten Publishers, 2014 (in Japanese).
Megane Usagi no Umibozu ga Deru, Keiko Sena (author and illustrator), Poplar Publishing, 2005 (in Japanese).
Nazonazo Ehon 2 no Maki, Rieko Nakagawa (author) and Yuriko Yamawaki (illustrator), Fukuinkan Shoten Publishers, 1988 (in Japanese).
Nazonazo Ehon 1 no Maki, Rieko Nakagawa (author) and Yuriko Yamawaki (illustrator), Fukuinkan Shoten Publishers, 1988 (in Japanese).
Nazonazo Ehon 3 no Maki, Rieko Nakagawa (author) and Yuriko Yamawaki (illustrator), Fukuinkan Shoten Publishers, 1988 (in Japanese).
Guri and Gura’s AIUEO, Rieko Nakagawa (author) and Yuriko Yamawaki (illustrator), Fukuinkan Shoten Publishers, 2002 (Japanese edition).
Guri to Gura no Shiritoriuta, Rieko Nakagawa (author) and Yuriko Yamawaki (illustrator), Fukuinkan Shoten Publishers, 2009 (in Japanese).
Bottom row (from left):
Guri to Gura no Omajinai, Rieko Nakagawa (author) and Yuriko Yamawaki (illustrator), Fukuinkan Shoten Publishers, 2009 (in Japanese).
Umi, Hirotaka Nakagawa (author) and Koshiro Hata (illustrator), Jiyukokuminsha, 2011 (in Japanese).
Thank You, Friend!, Rintaro Uchida (author) and Nana Furiya (illustrator), KAISEI-SHA, 2003 (in Japanese).
10-piki no Kaeru Umi e Iku, Hisako Madokoro (author) and Michiko Nakagawa (illustrator), PHP Institute, 2004 (in Japanese).
Chili to Chilili - Umi no Ohanashi, Kaya Doi (author and illustrator), Alice Kan, 2004 (in Japanese).
Dr. Mouse’s Mission, Masafumi Nakagawa (author) and Yuriko Yamawaki (illustrator), Fukuinkan Shoten Publishers, 1977 (Japanese edition).
The Sky Blue Seed, Rieko Nakagawa (author) and Yuriko Omura (illustrator), Fukuinkan Shoten Publishers, 1967 (Japanese edition).
Picture books shown in Fig. 5
There Are No Ghosts, Keiko Sena (author and illustrator), Poplar Publishing, 2009 (in Japanese).
Megane Usagi, Keiko Sena (author and illustrator), Poplar Publishing, 1975 (in Japanese).
Bunbun Kiiro, Akio Kashiwara (author and illustrator), Gakken Plus, 2010 (in Japanese).
Kuishinbo Usagi, Keiko Sena (author and illustrator), Poplar Publishing, 2004 (in Japanese).
Nenaiko Dareda, Keiko Sena (author and illustrator), Fukuinkan Shoten Publishers, 1978 (in Japanese).
Yasai no Onaka, Katsu Kiuchi (author and illustrator), Fukuinkan Shoten Publishers, 1997 (in Japanese).
Megane Usagi no Umibozu ga Deru!!, Keiko Sena (author and illustrator), Poplar Publishing, 2005 (in Japanese).
Aka Aka Kuro Kuro, Akio Kashiwara (author and illustrator), Gakken Plus, 2010 (in Japanese).
Kudamono Nanda, Katsu Kiuchi (author and illustrator), Fukuinkan Shoten Publishers, 2007 (in Japanese).
Kongaragacchi - Docchi ni Susumu? no Hon, Euphrates (author and illustrator), Shogakukan, 2009 (in Japanese).
Picture books shown in Fig. 6
Koguma-chan, Arigato, Ken Wakayama (author and illustrator), Koguma Publishing, 1972 (in Japanese).
Jidosha, Ken Wakayama (author and illustrator), Koguma Publishing, 1994 (in Japanese).
Itsumademo, Anna Pignataro (author and illustrator), Machi Tawara (translator), Shufunotomo, 2007 (Japanese edition).