[go: up one dir, main page]

CN115080847A - Address library supplementing method and system, electronic equipment and readable storage medium - Google Patents

Address library supplementing method and system, electronic equipment and readable storage medium Download PDF

Info

Publication number
CN115080847A
CN115080847A CN202210654565.6A CN202210654565A CN115080847A CN 115080847 A CN115080847 A CN 115080847A CN 202210654565 A CN202210654565 A CN 202210654565A CN 115080847 A CN115080847 A CN 115080847A
Authority
CN
China
Prior art keywords
data
address
supplementary
map
address data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210654565.6A
Other languages
Chinese (zh)
Inventor
包祖贻
李辰
章波
张月
曹俊杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Alibaba China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba China Co Ltd filed Critical Alibaba China Co Ltd
Priority to CN202210654565.6A priority Critical patent/CN115080847A/en
Publication of CN115080847A publication Critical patent/CN115080847A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本说明书实施例提供了地址库补充方法、系统、电子设备及可读存储介质,其中,所述方法包括:获取地址众包采集数据;获取地址库数据和地图兴趣点数据;结合所述地址库数据和所述地图兴趣点数据,获取所述地址众包采集数据中不存在于所述地址库中的地址数据,以得到补充地址数据;将所述补充地址数据输出至所述地址库。上述方案能够极大地降低地址库的扩充成本,提高更新速度,并有较高的准确率。

Figure 202210654565

The embodiments of this specification provide a method, system, electronic device, and readable storage medium for supplementing an address database, wherein the method includes: acquiring address crowdsourcing collection data; acquiring address database data and map POI data; combining the address database data and the map POI data, obtain address data that does not exist in the address database in the address crowdsourcing collection data, to obtain supplementary address data; and output the supplementary address data to the address database. The above solution can greatly reduce the expansion cost of the address library, improve the update speed, and have a high accuracy rate.

Figure 202210654565

Description

地址库补充方法、系统、电子设备及可读存储介质Address library supplement method, system, electronic device and readable storage medium

技术领域technical field

本说明书实施例涉及数据处理技术领域,尤其涉及地址库补充方法、系统、电子设备及可读存储介质。The embodiments of this specification relate to the technical field of data processing, and in particular, to a method, system, electronic device, and readable storage medium for supplementing an address library.

背景技术Background technique

地址库是存储现实地址信息的知识库,地址搜索、地址补齐和地址纠错等各种地址服务都非常依赖地址库。地址库的数据准确性、完备度和覆盖度都会大大影响各种地址服务的性能。The address library is a knowledge base that stores real address information. Various address services such as address search, address completion, and address error correction rely heavily on the address library. The data accuracy, completeness and coverage of the address database will greatly affect the performance of various address services.

传统的基于人工增补的地址库扩充方法,需要大量的采集设备和人工实地考察以对新增地址进行增补,对已有地址进行校验,这需要大量的人力和时间,更新实时性较差,成本高昂,这也进一步限制了地址库的扩充。The traditional address database expansion method based on manual supplementation requires a large number of acquisition equipment and manual field inspections to supplement the new addresses and verify the existing addresses, which requires a lot of manpower and time, and the update real-time performance is poor. The high cost further limits the expansion of the address library.

背景技术部分的内容仅仅是公开人所知晓的技术,并不当然代表本领域的现有技术。The contents in the Background section are merely technologies known to the disclosed person, and do not of course represent the prior art in the field.

发明内容SUMMARY OF THE INVENTION

有鉴于此,本说明书实施例提供一种地址库补充方法、系统、电子设备及可读存储介质,能够极大地降低地址库的扩充成本,提高更新速度,并有较高的准确率。In view of this, the embodiments of this specification provide an address library supplement method, system, electronic device and readable storage medium, which can greatly reduce the expansion cost of the address library, improve the update speed, and have a higher accuracy rate.

首先,本说明书实施例提供了一种地址库补充方法,包括:First, an embodiment of this specification provides a method for supplementing an address library, including:

获取地址众包采集数据;Obtain address crowdsourcing to collect data;

获取地址库数据和地图兴趣点数据;Obtain address database data and map POI data;

结合所述地址库数据和所述地图兴趣点数据,获取所述地址众包采集数据中不存在于所述地址库中的地址数据,以得到补充地址数据;combining the address database data and the map POI data, acquiring address data that does not exist in the address database in the address crowdsourcing collection data to obtain supplementary address data;

将所述补充地址数据输出至所述地址库。The supplemental address data is output to the address library.

可选地,在获取所述地址众包采集数据中不存在于所述地址库中的地址数据之前,还包括:Optionally, before acquiring the address data that does not exist in the address database in the address crowdsourcing collection data, the method further includes:

对所述地址众包采集数据进行预处理,得到格式统一信息完整的初始地址数据,以用于获取所述补充地址数据。The address crowdsourcing collection data is preprocessed to obtain initial address data with unified format and complete information, which is used to obtain the supplementary address data.

可选地,所述对所述地址众包采集数据进行预处理,得到格式统一信息完整的初始地址数据,包括:Optionally, performing preprocessing on the address crowdsourcing collection data to obtain initial address data with complete unified format information, including:

对所述地址众包采集数据中的地址文本进行分块切分、识别地址属性类型,以及补充行政区划信息,得到所述初始地址数据,所述初始地址数据包括:坐标数据和对应的地址文本数据,所述地址文本数据包括:地址属性类型和行政区划信息。Divide the address text in the address crowdsourcing collection data into blocks, identify the address attribute type, and supplement the administrative division information to obtain the initial address data, where the initial address data includes: coordinate data and corresponding address text data, the address text data includes: address attribute type and administrative division information.

可选地,所述结合所述地址库数据和所述地图兴趣点数据,获取所述地址众包采集数据中不存在于所述地址库中的地址数据,以得到补充地址数据,包括:Optionally, combining the address database data and the map POI data to obtain address data that does not exist in the address database in the address crowdsourcing collection data to obtain supplementary address data, including:

将所述初始地址数据与所述地图兴趣点数据进行坐标匹配,保留坐标匹配且具有预设地址属性类型的兴趣点区块对应的初始地址数据,作为候选地址数据;Coordinate matching is performed between the initial address data and the map POI data, and the initial address data corresponding to the POI blocks that match the coordinates and have a preset address attribute type are reserved as candidate address data;

结合所述地址库数据和所述地图兴趣点数据,剔除所述候选地址数据中的重复数据和噪声数据,以得到所述补充地址数据。Combined with the address database data and the map interest point data, duplicate data and noise data in the candidate address data are eliminated to obtain the supplementary address data.

可选地,所述将所述初始地址数据与所述地图兴趣点数据进行坐标匹配,保留坐标匹配且具有预设地址属性类型的兴趣点区块的初始地址数据,作为候选地址数据,包括:Optionally, performing coordinate matching between the initial address data and the map POI data, and retaining the initial address data of the POI blocks that match the coordinates and have a preset address attribute type, as candidate address data, including:

根据所述初始地址数据中的坐标数据,查询所述地图兴趣点数据中坐标匹配的兴趣点区块;According to the coordinate data in the initial address data, query the POI blocks whose coordinates in the map POI data match;

将所述兴趣点区块中包含的具有建筑属性类型的兴趣点子块与所述坐标数据进行坐标匹配,保留坐标匹配的初始地址数据,作为所述候选地址数据。Coordinate matching is performed between the POI sub-blocks with the building attribute type contained in the POI block and the coordinate data, and the initial address data whose coordinates are matched is retained as the candidate address data.

可选地,所述结合所述地址库数据和所述地图兴趣点数据,剔除所述候选地址数据中的重复数据和噪声数据,得到所述补充地址数据,包括以下至少一种:Optionally, by combining the address database data and the map POI data, eliminating duplicate data and noise data in the candidate address data to obtain the supplementary address data, including at least one of the following:

结合地址库数据和所述地图兴趣点数据中的兴趣点信息,与所述候选地址数据进行匹配,确定属于重复地址,或者地址属性类型属于新增楼栋,或者属于新增挂载地址的分类,则剔除地址重复的初始地址数据;Combining the address database data and the POI information in the map POI data, and matching the candidate address data, it is determined that it belongs to a duplicate address, or the address attribute type belongs to a newly added building, or belongs to a classification of a newly added mount address , then remove the initial address data with duplicate addresses;

根据所述地图兴趣点数据中的兴趣点信息,对所述候选地址数据进行粗聚类,去除所述候选地址数据中的噪声数据,得到所述补充地址数据。According to the POI information in the map POI data, rough clustering is performed on the candidate address data, noise data in the candidate address data is removed, and the supplementary address data is obtained.

可选地,在将所述补充地址数据输出至所述地址库之前,还包括:Optionally, before outputting the supplementary address data to the address library, the method further includes:

对所述补充地址数据进行空间聚类,得到聚类中心坐标及对应的地址描述文本,作为输出至所述地址库的补充地址数据。Perform spatial clustering on the supplementary address data to obtain cluster center coordinates and corresponding address description texts as supplementary address data output to the address library.

可选地,所述对所述补充地址数据进行空间聚类,得到聚类中心坐标及对应的地址描述文本,包括:Optionally, performing spatial clustering on the supplementary address data to obtain cluster center coordinates and corresponding address description text, including:

使用抗噪聚类算法,对描述同一地点的数据进行聚类,得到聚类中心坐标;Use the anti-noise clustering algorithm to cluster the data describing the same location to obtain the coordinates of the cluster center;

对所述补充地址数据中的描述同一地点的地址文本数据进行聚类补充,得到对应的地址描述文本。Clustering and supplementing the address text data describing the same location in the supplementary address data to obtain the corresponding address description text.

本说明书实施例还提供了一种地址库补充系统,包括:The embodiments of this specification also provide an address library supplement system, including:

第一数据获取单元,适于获取地址众包采集数据;a first data acquisition unit, adapted to acquire address crowdsourcing collection data;

第二数据获取单元,适于获取地图兴趣点数据;a second data acquisition unit, adapted to acquire map point of interest data;

第三数据获取单元,适于获取地址库数据;a third data acquisition unit, adapted to acquire address database data;

补充数据生成单元,适于结合所述地图兴趣点数据,获取所述地址众包采集数据中不存在于所述地址库中的地址数据,作为补充地址数据;a supplementary data generating unit, adapted to acquire, in combination with the map POI data, address data that does not exist in the address database in the address crowdsourcing collection data, as supplementary address data;

输出单元,适于将所述补充地址数据输出至所述地址库。An output unit, adapted to output the supplementary address data to the address library.

可选地,所述系统还包括:预处理单元,适于对所述地址众包采集数据进行预处理,得到格式统一信息完整的初始地址数据,以用于获取所述补充地址数据。Optionally, the system further includes: a preprocessing unit, adapted to preprocess the address crowdsourcing collection data to obtain initial address data with complete unified format information, so as to obtain the supplementary address data.

可选地,所述系统还包括:空间聚类单元,适于对所述补充地址数据进行空间聚类,得到聚类中心坐标及对应的地址描述文本,作为输出至所述地址库的补充地址数据。Optionally, the system further includes: a spatial clustering unit, adapted to perform spatial clustering on the supplementary address data, to obtain the coordinates of the cluster center and the corresponding address description text, as supplementary addresses output to the address library data.

本说明书实施例还提供了一种电子设备,包括存储器和处理器,所述存储器上存储有可在所述处理器上运行的计算机程序,所述处理器运行所述计算机程序时执行前述任一实施例所述的方法的步骤。Embodiments of the present specification further provide an electronic device, including a memory and a processor, where the memory stores a computer program that can run on the processor, and the processor executes any of the foregoing when the computer program runs. The steps of the method described in the examples.

本说明书实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序运行时执行前述任一实施例所述的方法的步骤。The embodiments of the present specification further provide a computer-readable storage medium on which a computer program is stored, and when the computer program runs, the steps of the method described in any of the foregoing embodiments are executed.

采用本说明书实施例的地址库补充方法,结合所述地图兴趣点数据,获取所述地址众包采集数据中不存在于所述地址库中的地址数据,作为补充地址数据,其中通过所述地图兴趣点数据,能够对所述地址众包采集数据进行交叉验证,由于地址众包采集数据的数据量庞大,而地图兴趣点数据精准,因此能够高效而准确地对地址库进行补充,因而能够极大地降低地址库的扩充成本,提高地址库更新速度,并有较高的准确率。The address database supplement method of the embodiment of this specification is adopted, and the address data that does not exist in the address database in the address crowdsourcing collection data is obtained in combination with the map interest point data, as supplementary address data, wherein the map Point-of-interest data, which can cross-validate the address crowdsourcing collection data. Since the data volume of the address crowdsourcing collection data is huge, and the map POI data is accurate, the address database can be supplemented efficiently and accurately, so it can be extremely It greatly reduces the expansion cost of the address library, improves the update speed of the address library, and has a higher accuracy rate.

进一步地,通过对所述地址众包采集数据进行预处理,得到格式统一信息完整的初始地址数据,以用于获取所述补充地址数据,能够进一步提高补充得到的地址库数据的准确性与信息完备性。Further, by preprocessing the address crowdsourcing collection data to obtain initial address data with complete unified format information, which is used to obtain the supplementary address data, the accuracy and information of the supplementary address database data can be further improved. completeness.

进一步地,通过对所述地址众包采集数据中的地址文本进行分块切分、识别地址属性类型,以及补充行政区划信息,得到所述初始地址数据,使得所述初始地址数据中的地址文本数据格式统一且完整,包括地址属性类型和行政区划信息等信息,从而利于与所述地图兴趣点进行匹配,提高所得到的补充地址数据的准确性与信息完备性,从而能够进一步提高补充得到的地址库数据的准确性与信息完备性。Further, the initial address data is obtained by dividing the address text in the address crowdsourcing collection data into blocks, identifying the address attribute type, and supplementing the administrative division information, so that the address text in the initial address data is obtained. The data format is unified and complete, including information such as address attribute types and administrative division information, which facilitates matching with the map interest points, improves the accuracy and completeness of the obtained supplementary address data, and further improves the supplementary address data. The accuracy and completeness of the address database data.

进一步地,通过将所述初始地址数据与所述地图兴趣点数据进行坐标匹配,保留坐标匹配且具有预设地址属性类型的兴趣点区块对应的初始地址数据,作为候选地址数据,进而结合所述地址库数据和所述地图兴趣点数据,剔除所述候选地址数据中的重复数据和噪声数据,得到所述补充地址数据,能够提高所述地址库数据的准确性,并节约地址库数据的存储空间。Further, by performing coordinate matching between the initial address data and the map POI data, the initial address data corresponding to the POI blocks whose coordinates are matched and have a preset address attribute type are reserved as candidate address data, and then combined with all the original address data. The address database data and the map POI data, the duplicate data and noise data in the candidate address data are eliminated, and the supplementary address data is obtained, which can improve the accuracy of the address database data and save the address database data. storage.

进一步地,根据所述初始地址数据中的坐标数据,查询所述地图兴趣点数据中坐标匹配的兴趣点区块,继而将所述兴趣点区块中包含的具有建筑属性类型的兴趣点子块与所述坐标数据进行坐标匹配,保留坐标匹配的初始地址数据,作为所述候选地址数据,能够进一步提高所述补充地址数据的精度,并节约地址库数据的存储空间。Further, according to the coordinate data in the initial address data, query the POI blocks whose coordinates are matched in the POI data on the map, and then compare the POI sub-blocks with building attribute types contained in the POI blocks with Coordinate matching is performed on the coordinate data, and the initial address data of the coordinate matching is retained as the candidate address data, which can further improve the accuracy of the supplementary address data and save the storage space of the address database data.

进一步地,通过结合地址库数据和所述地图兴趣点数据中的兴趣点信息,与所述候选地址数据进行匹配,确定属于重复地址,或者地址属性类型属于新增楼栋,或者属于新增挂载地址的分类,则剔除地址重复的初始地址数据;并且,根据所述地图兴趣点中的兴趣点信息,对所述候选地址数据进行粗聚类,去除所述候选地址数据中的噪声数据,得到所述补充地址数据。采用上述方式对数据进行解析分拣以及聚类,能够有效识别出重复地址数据及噪声数据并进行剔除,从而能够提高所述补充地址数据的精度及可用性,并节约地址库数据的存储空间。Further, by combining the address database data and the POI information in the map POI data, and matching the candidate address data, it is determined that it belongs to a duplicate address, or the address attribute type belongs to a newly added building, or belongs to a newly added hanging. According to the classification of the carrier address, the initial address data with repeated addresses is eliminated; and, according to the point of interest information in the map interest point, the candidate address data is roughly clustered, and the noise data in the candidate address data is removed, The supplementary address data is obtained. Using the above method to analyze, sort and cluster data can effectively identify and eliminate duplicate address data and noise data, thereby improving the accuracy and usability of the supplementary address data and saving the storage space of address database data.

进一步地,通过对所述补充地址数据进行空间聚类,得到聚类中心坐标及对应的地址描述文本,作为所述地址库的补充地址数据,能够进一步提高所述地址库的补充地址数据的精准度。Further, by spatially clustering the supplementary address data, the coordinates of the cluster center and the corresponding address description text are obtained as supplementary address data of the address database, which can further improve the accuracy of the supplementary address data of the address database. Spend.

附图说明Description of drawings

为了更清楚地说明本说明书实施例的技术方案,下面将对本说明书实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面所描述的附图仅仅是本说明书的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the technical solutions of the embodiments of the present specification more clearly, the following briefly introduces the drawings that are required to be used in the embodiments of the present specification or the description of the prior art. Obviously, the drawings described below are only for the purposes of the present specification. In some embodiments, for those of ordinary skill in the art, other drawings can also be obtained according to these drawings without any creative effort.

图1示出了本说明书实施例中一种地址库补充方法的流程图。FIG. 1 shows a flowchart of a method for supplementing an address library in an embodiment of the present specification.

图2示出了本说明书实施例中另一种地址库补充方法的流程图。FIG. 2 shows a flowchart of another method for supplementing an address library in an embodiment of the present specification.

图3示出了本说明书实施例中一具体场景中通过地图兴趣点数据交叉匹配获取候选地址数据的具体方式示意图。FIG. 3 is a schematic diagram showing a specific manner of obtaining candidate address data through cross-matching of map POI data in a specific scenario in the embodiment of the present specification.

图4示出了本说明书实施例中一种地址库补充系统的结构示意图。FIG. 4 shows a schematic structural diagram of an address library supplementing system in an embodiment of the present specification.

图5示出了本说明书实施例中一种电子设备的结构示意图。FIG. 5 shows a schematic structural diagram of an electronic device in an embodiment of the present specification.

具体实施方式Detailed ways

如背景技术所述,传统的基于人工增补的地址库扩充方法,需要大量的采集设备和人工实地考察以对新增地址进行增补,对已有地址进行校验,这需要耗费大量的人力和时间,更新实时性较差,成本高昂,这也进一步限制了地址库的扩充。As described in the background art, the traditional address database expansion method based on manual supplementation requires a large number of collection equipment and manual field inspections to supplement the newly added addresses and verify the existing addresses, which requires a lot of manpower and time. , the update has poor real-time performance and high cost, which further limits the expansion of the address library.

针对上述问题,本说明书实施例提供了相应地地址库补充方案,具体地,基于数据量庞大的地址众包采集数据,并结合地图兴趣点数据,获取所述地址众包采集数据中不存在于所述地址库中的地址数据,作为补充地址数据。其中,通过所述地图兴趣点数据,能够对所述地址众包采集数据进行交叉验证,由于地址众包采集数据的数据量庞大,而地图兴趣点数据精准,因此能够高效而准确地对地址库进行补充,因而能够极大地降低地址库的扩充成本,提高地址库更新速度,并有较高的准确率。In response to the above problems, the embodiments of this specification provide a corresponding supplementary solution to the address database. Specifically, based on the address crowdsourcing collection data with a huge amount of data, combined with the map point of interest data, the address crowdsourcing collection data that does not exist in the collection data is obtained. The address data in the address library is used as supplementary address data. Wherein, through the map POI data, cross-validation can be performed on the address crowdsourcing collection data. Since the data volume of the address crowdsourcing collection data is huge, and the map POI data is accurate, it is possible to efficiently and accurately check the address database. Therefore, the expansion cost of the address library can be greatly reduced, the update speed of the address library can be improved, and the accuracy rate can be higher.

为使本领域技术人员更好地理解本说明书实施例的技术构思、工作原理及优点,实施本说明书实施例的技术方案,以下参照附图,并结合具体应用场景,对本说明书实施例通过一些具体的可选示例进行详细介绍。In order for those skilled in the art to better understand the technical concepts, working principles and advantages of the embodiments of this specification, and to implement the technical solutions of the embodiments of this specification, hereinafter, referring to the accompanying drawings, and in combination with specific application scenarios, the embodiments of this specification are described through some specific examples. The optional examples are described in detail.

参照图1所示的地址库补充方法的流程图,在本说明书一些实施例中,具体可以通过如下步骤实现地址库的数据增补:Referring to the flowchart of the address database supplement method shown in FIG. 1, in some embodiments of this specification, the data supplement of the address database can be implemented by the following steps:

S11,获取地址众包采集数据。S11 , obtaining address crowdsourcing collection data.

在具体实施中,可以通过大量在道路上行驶的人工驾驶或者自动驾驶的车辆对地图数据进行采集,得到所述地址众包采集数据。In a specific implementation, map data may be collected by a large number of manually-driven or automatically-driven vehicles driving on the road to obtain the address crowdsourcing collection data.

为了进一步提高地址库的更新速度,各车辆可以将采集到的地址众包采集数据实时上传,或者在有网络时即上传,或者按照预设的时间间隔尽快上传,例如可以每天上传,或者每周上传等。In order to further improve the update speed of the address database, each vehicle can upload the collected address crowdsourcing data in real time, or upload it when there is a network, or upload it as soon as possible according to a preset time interval, for example, it can be uploaded every day, or every week. upload etc.

为了节约网络传输资源,在另一些实施例中,可以在车辆上设置传感器实时监测环境变化,并与地图(具体可以为高精度地图)进行比对,当发现道路变化时,将数据上传用于地址库增补。In order to save network transmission resources, in other embodiments, sensors can be set on the vehicle to monitor environmental changes in real time, and compared with a map (specifically, a high-precision map), and when a road change is found, the data is uploaded for Address library additions.

作为可选示例,车辆可以基于视觉算法形成地图或基于激光雷达形成地图。在一些实施例中,可以利用摄像头采集视频数据,经过深度学习算法,或者图像识别,大量的数据投喂给深度学习算法,则让数据处理设备有了更强大的识别能力,从而提高了数据处理的能力。As an alternative example, the vehicle may form a map based on a vision algorithm or based on a lidar. In some embodiments, a camera can be used to collect video data, and through deep learning algorithm or image recognition, a large amount of data is fed to the deep learning algorithm, so that the data processing device has a stronger recognition ability, thereby improving data processing. Ability.

S12,获取地址库数据和地图兴趣点数据。S12, acquire address database data and map POI data.

兴趣点(PointofInteresting,POI)数据,指在地图上有意义的点,例如商店、酒吧、加油站、医院、车站等。Point of Interest (POI) data refers to meaningful points on the map, such as shops, bars, gas stations, hospitals, stations, etc.

在具体实施中,地图兴趣点数据可以包括兴趣点名称、地址、坐标、地址属性类型等多种基础信息,其中属性类型例如可以包括建筑属性信息。此外还可以包括一些详细信息,其中可能包括商业属性信息等,其中商业属性信息可以包括例如所属垂直行业、营业时间,以及其他一些用户可能感兴趣的信息,例如:价格、评分、评论、菜单等,具体详细信息类型与兴趣点类型相关。其中,一个商业属性类型的兴趣点,可能包括多个建筑属性的兴趣点区块。In a specific implementation, the map POI data may include various basic information such as POI name, address, coordinates, and address attribute type, wherein the attribute type may include, for example, building attribute information. In addition, it can also include some detailed information, which may include business attribute information, etc., where the business attribute information can include, for example, the vertical industry to which it belongs, business hours, and other information that may be of interest to users, such as: price, rating, comment, menu, etc. , the specific details type is related to the POI type. Among them, a point of interest of a commercial attribute type may include a point of interest block of multiple building attributes.

可以理解的是,本说明书实施例中对兴趣点的商业属性类型并不做任何限定。It can be understood that, the embodiment of the present specification does not make any limitation on the business attribute type of the POI.

S13,结合所述地址库数据和地图兴趣点数据,获取所述地址众包采集数据中不存在于所述地址库中的地址数据,作为补充地址数据。S13, in combination with the address database data and the map point of interest data, acquire address data that does not exist in the address database in the address crowdsourcing collection data, as supplementary address data.

在具体实施中,为了提高所获得的补充地址数据的精度,节约地址库数据存储资源,可以将地址众包采集数据与所述地图兴趣点数据进行坐标匹配,保留坐标匹配且具有预设地址属性类型的兴趣点区块对应的初始地址数据,作为候选地址数据,之后,可以结合所述地址库数据和所述地图兴趣点数据,剔除所述候选地址数据中的重复数据和噪声数据,得到所述补充地址数据。In a specific implementation, in order to improve the accuracy of the obtained supplementary address data and save the data storage resources of the address database, the address crowdsourcing data can be coordinately matched with the map POI data, and the coordinate matching and the preset address attribute can be reserved. The initial address data corresponding to the type of POI block is used as candidate address data. After that, the address database data and the map POI data can be combined to eliminate duplicate data and noise data in the candidate address data to obtain the desired address. The supplementary address data is described.

S14,将所述补充地址数据输出至所述地址库。S14, outputting the supplementary address data to the address library.

虽然地址众包采集数据不够精准,数据质量不高,但是其数据量庞大,且地图兴趣点数据精准,因此,通过步骤S13,通过所述地图兴趣点数据,能够对所述地址众包采集数据进行交叉验证,因此,采用上述实施例,能够高效而准确地对地址库进行补充,故能够极大地降低地址库的扩充成本,提高地址库更新速度,并有较高的准确率。Although the data collected by the address crowdsourcing is not accurate enough and the data quality is not high, the data volume is huge, and the map POI data is accurate. Therefore, through step S13, through the map POI data, the address crowdsourcing data can be collected. Cross-validation is performed. Therefore, by using the above embodiment, the address database can be supplemented efficiently and accurately, so the expansion cost of the address database can be greatly reduced, the update speed of the address database can be improved, and the accuracy rate can be higher.

在具体实施中,为了进一步提高地址库补充数据的质量,还可以对上述实施例作进一步的扩展及优化。以下通过一些具体示例,并结合具体应用场景进行详细说明。In a specific implementation, in order to further improve the quality of the supplementary data of the address library, the above-mentioned embodiment may be further expanded and optimized. The following is a detailed description of some specific examples combined with specific application scenarios.

参照图2所示的另一种地址库补充方法的流程图,在本说明书另一些实施例中,具体可以通过如下步骤对地址库进行补充:Referring to the flowchart of another address library supplement method shown in FIG. 2, in other embodiments of this specification, the address library may be supplemented by the following steps:

S21,获取地址众包采集数据。S21 , obtaining address crowdsourcing collection data.

关于如何获取地址众包采集数据可以参见前述实施例步骤S11的详细介绍。For how to obtain the address crowdsourcing collection data, please refer to the detailed introduction of step S11 in the foregoing embodiment.

S22,对所述地址众包采集数据进行预处理,得到格式统一信息完整的初始地址数据,以用于获取所述补充地址数据。S22, preprocessing the address crowdsourcing collection data to obtain initial address data with complete unified format information, so as to obtain the supplementary address data.

在具体实施中,初步获取到的地址众包采集数据格式可能是不统一的,或者信息可能是有缺失的,针对这种情况,可以对所述地址众包采集数据中的地址文本进行分块切分、识别地址属性类型,以及补充行政区划信息,得到所述初始地址数据,所生成的格式统一信息完整的初始地址数据可以包括:坐标数据和对应的地址文本数据,所述地址文本数据包括:地址属性类型和行政区划信息。In the specific implementation, the format of the address crowdsourcing collection data obtained initially may not be uniform, or the information may be missing. In response to this situation, the address text in the address crowdsourcing collection data may be divided into blocks Segment, identify the address attribute type, and supplement the administrative division information to obtain the initial address data, and the generated initial address data with complete format unified information may include: coordinate data and corresponding address text data, where the address text data includes : Address attribute type and administrative division information.

以下给出一具体示例,一条地址众包采集数据为:香槟国际3幢1402有电梯,坐标120.110081,30.308145,其中包括坐标数据和对应的地址文本数据,经过步骤S22预处理,对地址众包采集数据中的地址文本进行分块切割,识别地址属性信息,并补充行政区划信息,得到的对应条目的初始地址数据可以为:A specific example is given below. An address crowdsourcing collection data is: Champagne International Building 3 Building 1402 has an elevator, coordinates 120.110081, 30.308145, including coordinate data and corresponding address text data. After step S22 preprocessing, the address crowdsourcing collection The address text in the data is divided into blocks, the address attribute information is identified, and the administrative division information is supplemented. The obtained initial address data of the corresponding entry can be:

prov=浙江省city=杭州市district=拱墅区poi=香槟国际houseno=3幢roomno=1402other=有电梯。prov=Zhejiang Province city=Hangzhou city district=Gongshu District poi=Champagne International houseno=3 buildings roomno=1402other=Elevator.

其中,分块切分地址信息,识别出兴趣点信息POI为“香槟国际”,楼栋标识为3幢,门牌号为1402,其他信息包括“有电梯”,补充的行政区划信息包括:“prov=浙江省city=杭州市district=拱墅区”,即行政区划信息为处于浙江省杭州市拱墅区。Among them, the address information is divided into blocks, the POI of the point of interest information is identified as "Champagne International", the building identification is 3, the house number is 1402, other information includes "there is an elevator", and the supplementary administrative division information includes: "prov = City of Zhejiang Province = District of Hangzhou City = Gongshu District”, that is, the administrative division information is located in Gongshu District, Hangzhou City, Zhejiang Province.

S23,获取地址库数据和地图兴趣点数据。S23, acquire address database data and map POI data.

具体如何获取地图兴趣点数据可以参见前述步骤S22中所述,此处不再展开描述。For details on how to obtain the map POI data, reference may be made to the foregoing step S22, which will not be described here.

S24,结合所述地址库数据和所述地图兴趣点数据,获取所述初始地址数据中不存在于所述地址库中的地址数据,以得到补充地址数据。S24, combining the address database data and the map POI data, acquire address data in the initial address data that does not exist in the address database to obtain supplementary address data.

在具体实施中,可以将所述初始地址数据与所述地图兴趣点数据进行坐标匹配,保留坐标匹配且具有预设地址属性类型的兴趣点区块对应的初始地址数据,作为候选地址数据;之后,作为可选步骤,可以结合所述地址库数据和所述地图兴趣点数据,剔除所述候选地址数据中的重复数据和噪声数据,以得到所述补充地址数据。In a specific implementation, the initial address data may be coordinate-matched with the map POI data, and the initial address data corresponding to the POI blocks that match the coordinates and have a preset address attribute type may be reserved as candidate address data; then , as an optional step, the address database data and the map POI data can be combined to eliminate duplicate data and noise data in the candidate address data to obtain the supplementary address data.

仍以步骤S22中的具体示例进行说明:首先,对于初始地址数据prov=浙江省city=杭州市district=拱墅区poi=香槟国际houseno=3幢roomno=1402other=有电梯,根据其中的坐标数据,查询地图兴趣点数据,进行坐标匹配,可以匹配得到一商业属性的AOI区域“香槟国际”,即参照图3所示的地图兴趣点30区域,且其中包含6个建筑属性的AOI区块,参见图3中的6个虚线框区域,将初始地址中的坐标数据与所述6个AOI区块的坐标进行匹配,发现初始地址数据中其中一些数据未命中建筑(图3中圆点所示),则可以将其作为噪声数据直接丢弃,另一些数据命中建筑(图3中X点所示),则进行保留,作为候选地址数据。The specific example in step S22 is still used for description: first, for the initial address data prov=Zhejiang Province city=Hangzhou city district=Gongshu District poi=Champagne International houseno=3 buildings roomno=1402other=there is an elevator, according to the coordinate data therein , query the map point of interest data, perform coordinate matching, and obtain a commercial attribute AOI area "Champagne International", that is, refer to the map interest point 30 area shown in Figure 3, and which contains 6 AOI blocks of architectural attributes, Referring to the 6 dotted frame areas in Fig. 3, the coordinate data in the initial address is matched with the coordinates of the 6 AOI blocks, and it is found that some of the data in the initial address data do not hit the building (as shown by the dots in Fig. 3 ) ), it can be directly discarded as noise data, and other data hits the building (indicated by the X point in Figure 3), then they are retained as candidate address data.

接下来,为了进一步提高所述补充地址数据的精度及可用性,并节约地址库数据的存储空间,可以对所述候选地址数据中的重复数据和噪声数据进行剔除。以下示出一些可选方式示例。可选方式一,结合地址库数据和所述地图兴趣点数据中的兴趣点信息,与所述候选地址数据进行匹配,确定属于重复地址,或者地址属性类型属于新增楼栋,或者属于新增挂载地址的分类,则可以剔除地址重复的初始地址数据。Next, in order to further improve the accuracy and usability of the supplementary address data and save the storage space of the address database data, duplicate data and noise data in the candidate address data may be eliminated. Some examples of alternatives are shown below. Optional way 1, combine the address database data and the POI information in the map POI data, and match the candidate address data to determine that it is a duplicate address, or the address attribute type belongs to a newly added building, or belongs to a newly added building. The classification of mount addresses can eliminate the initial address data with duplicate addresses.

仍以上述具体示例为例,为描述方便,将上例中的地址prov=浙江省city=杭州市district=拱墅区poi=香槟国际houseno=3幢roomno=1402记为地址A,结合地址库数据和地图兴趣点数据,即查询地图兴趣点数据得到的AOI区域“香槟国际”记为地址B,将地址B与地址A进行匹配,根据二者共有的公共子串的长度、公共子串长度在地址A长度占比,公共子串长度在地址B长度占比、以及地址A和地址B的区别字符串片段的类型,地址A和地址B附近是否分别还有同名的数据等特征等,具体可以使用一线性分类器将所述地图兴趣点数据和所述初始地址数据进行交叉匹配,得到该初始地址数据对应的地址类型为新增楼栋。Still taking the above specific example as an example, for the convenience of description, the address in the above example prov=Zhejiang province city=Hangzhou city district=Gongshu district poi=Champagne international houseno=3 roomno=1402 is recorded as address A, combined with the address database Data and map POI data, that is, the AOI area "Champagne International" obtained by querying the map POI data is recorded as address B, and address B is matched with address A according to the length of the common substring and the length of the common substring shared by the two. In the proportion of the length of address A, the proportion of the length of the common substring in the length of address B, and the type of different string fragments between addresses A and B, whether there are data with the same name near address A and address B, etc. A linear classifier may be used to perform cross-matching between the map POI data and the initial address data, and it is obtained that the address type corresponding to the initial address data is a newly added building.

可选方式二,根据所述地图兴趣点数据中的兴趣点信息,对所述候选地址数据进行粗聚类,去除所述候选地址数据中的噪声数据,得到所述补充地址数据。Optional way 2, perform rough clustering on the candidate address data according to the point of interest information in the map interest point data, remove noise data in the candidate address data, and obtain the supplementary address data.

继续接上例描述,若新增楼栋“3幢”有多个初始地址数据(打点数据),可能大部分数据在某一个建筑上,少部分在其他建筑上,则通过粗聚类可以仅保留某个打点数量最多的建筑上的数据,将其他数据作为噪声数据进行去除。Continuing with the description of the above example, if the new building "3" has multiple initial address data (dot data), most of the data may be in a certain building, and a small part is in other buildings. Retain the data on a building with the largest number of dots, and remove other data as noise data.

在具体实施中,上述可选方式一和可选方式二可以结合实施。In a specific implementation, the above-mentioned optional mode 1 and optional mode 2 may be implemented in combination.

在具体实施中,可以直接将步骤S24处理后的数据作为补充地址数据,接下来直接执行步骤S26,输出至所述地址库,也可以将步骤S25处理后的数据仅作为粗选补充地址数据,进一步执行步骤S25。In a specific implementation, the data processed in step S24 can be directly used as supplementary address data, and then step S26 is directly executed to output to the address library, or the data processed in step S25 can be used only as rough selection supplementary address data, Step S25 is further executed.

S25,对所述粗选补充地址数据进行空间聚类,得到聚类中心坐标及对应的地址描述文本,作为输出至所述地址库的补充地址数据。S25: Perform spatial clustering on the rough selected supplementary address data to obtain cluster center coordinates and corresponding address description texts as supplementary address data output to the address library.

在具体实施中,一方面,可以使用抗噪聚类算法,对描述同一地点的数据进行聚类,得到聚类中心坐标;另一方面,对所述补充地址数据中的描述同一地点的地址文本数据进行聚类补充,得到对应的地址描述文本。In a specific implementation, on the one hand, an anti-noise clustering algorithm can be used to cluster the data describing the same place to obtain the coordinates of the cluster center; on the other hand, the address text describing the same place in the supplementary address data The data is clustered and supplemented to obtain the corresponding address description text.

例如,使用具有噪声的基于密度的聚类方法(Density-Based SpatialClustering of Applications with Noise,DBSCAN)等抗噪聚类算法,对一批描述同一地点的数据进行聚类,得到聚类中心坐标。其中,DBSCAN算法是一种基于密度的空间聚类算法,该算法将具有足够密度的区域划分为簇,并在具有噪声的空间数据库中发现任意形状的簇,它将簇定义为密度相连的点的最大集合。可以理解的是,也可以采用其他的抗噪聚类算法,例如其他类型的基于密度的空间聚类算法,也可以是其他的基于距离的空间聚类算法,本说明书实施例中并不限定所采用的具体的空间聚类算法的类型。For example, an anti-noise clustering algorithm such as Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is used to cluster a batch of data describing the same location to obtain the coordinates of the cluster center. Among them, the DBSCAN algorithm is a density-based spatial clustering algorithm, which divides regions with sufficient density into clusters, and finds clusters of arbitrary shapes in a spatial database with noise, which defines clusters as density-connected points the largest collection of . It can be understood that other anti-noise clustering algorithms can also be used, such as other types of density-based spatial clustering algorithms, or other distance-based spatial clustering algorithms, which are not limited in the embodiments of this specification. The specific type of spatial clustering algorithm used.

同时,可以对文本进行聚类补充,得到地址描述文本,例如有两条地址:浙江省->杭州市->拱墅区->[道路]->保利香槟国际->3幢和浙江省->杭州市->拱墅区->人文路->保利香槟国际->3幢->润园大药房,可以对第一条地址进行补充,得到,浙江省->杭州市->拱墅区->人文路->保利香槟国际->3幢。At the same time, the text can be clustered and supplemented to obtain the address description text. For example, there are two addresses: Zhejiang Province->Hangzhou City->Gongshu District->[Road]->Poly Champagne International->3 Buildings and Zhejiang Province- >Hangzhou City->Gongshu District->Human Road->Poly Champagne International->Building 3->Runyuan Pharmacy, you can supplement the first address and get, Zhejiang Province->Hangzhou City->Gongshu District ->Human Road->Poly Champagne International->3 Buildings.

S26,将所述补充地址数据输出至所述地址库。S26, outputting the supplementary address data to the address library.

最后,可以将步骤S25生成的聚类中心坐标和地址描述文本作为补充地址数据,补充入库,或者直接将步骤24得到的数据作为补充地址数据补充入库。Finally, the cluster center coordinates and address description text generated in step S25 can be used as supplementary address data to be supplemented into the warehouse, or the data obtained in step 24 can be directly supplemented into the warehouse as supplementary address data.

采用上述实施例,其中通过步骤S22,对所述地址众包采集数据进行预处理,得到格式统一信息完整的初始地址数据,以用于获取所述补充地址数据,能够进一步提高补充得到的地址库数据的准确性与信息完备性。By adopting the above embodiment, in step S22, the address crowdsourcing collection data is preprocessed to obtain initial address data with complete format unified information, which is used to obtain the supplementary address data, which can further improve the address database obtained by supplementation. Data accuracy and completeness of information.

更具体地,通过对所述地址众包采集数据中的地址文本进行分块切分、识别地址属性类型,以及补充行政区划信息,得到所述初始地址数据,使得所述初始地址数据中的地址文本数据格式统一且完整,包括地址属性类型和行政区划信息等信息,从而利于与所述地图兴趣点进行匹配,提高所得到的补充地址数据的准确性与信息完备性,从而能够进一步提高补充得到的地址库数据的准确性与信息完备性。More specifically, the initial address data is obtained by dividing the address text in the address crowdsourcing collection data into blocks, identifying the address attribute type, and supplementing the administrative division information, so that the address in the initial address data is The text data format is unified and complete, including information such as address attribute type and administrative division information, which is conducive to matching with the map interest points, and improves the accuracy and completeness of the obtained supplementary address data, thereby further improving the supplementary data. The accuracy and completeness of the address database data.

此外,在步骤S24中,通过将所述初始地址数据与所述地图兴趣点数据进行坐标匹配,保留坐标匹配且具有预设地址属性类型的兴趣点区块对应的初始地址数据,作为候选地址数据,进而结合所述地址库数据和所述地图兴趣点数据,剔除所述候选地址数据中的重复数据和噪声数据,得到所述补充地址数据,能够提高所述地址库数据的准确性,并节约地址库数据的存储空间。In addition, in step S24, by performing coordinate matching between the initial address data and the map POI data, the initial address data corresponding to the POI blocks whose coordinates match and have a preset address attribute type are reserved as candidate address data , and then combine the address database data and the map interest point data to eliminate duplicate data and noise data in the candidate address data to obtain the supplementary address data, which can improve the accuracy of the address database data and save money. Storage space for address database data.

其中,根据所述初始地址数据中的坐标数据,查询所述地图兴趣点数据中坐标匹配的兴趣点区块,继而将所述兴趣点区块中包含的具有建筑属性类型的兴趣点子块与所述坐标数据进行坐标匹配,保留坐标匹配的初始地址数据,作为所述候选地址数据,能够进一步提高所述补充地址数据的精度,并节约地址库数据的存储空间。Wherein, according to the coordinate data in the initial address data, query the POI blocks whose coordinates match in the POI data on the map, and then compare the POI sub-blocks with the building attribute type contained in the POI block with all the POI blocks. Coordinate matching is performed on the coordinate data, and the initial address data of the coordinate matching is reserved as the candidate address data, which can further improve the accuracy of the supplementary address data and save the storage space of the address database data.

而通过结合地址库数据和所述地图兴趣点数据中的兴趣点信息,与所述候选地址数据进行匹配,确定属于重复地址,或者地址属性类型属于新增楼栋,或者属于新增挂载地址的分类,则剔除地址重复的初始地址数据;并且,根据所述地图兴趣点中的兴趣点信息,对所述候选地址数据进行粗聚类,去除所述候选地址数据中的噪声数据,得到所述补充地址数据。采用上述方式对数据进行解析分拣以及聚类,能够有效识别出重复地址数据及噪声数据并进行剔除,从而能够提高所述补充地址数据的精度及可用性,并节约地址库数据的存储空间。By combining the address database data and the POI information in the map POI data, and matching the candidate address data, it is determined that it belongs to a duplicate address, or the address attribute type belongs to a newly added building, or belongs to a newly added mount address and, according to the POI information in the map POI, perform rough clustering on the candidate address data, remove the noise data in the candidate address data, and obtain the The supplementary address data is described. Using the above method to analyze, sort and cluster data can effectively identify and eliminate duplicate address data and noise data, thereby improving the accuracy and usability of the supplementary address data and saving the storage space of address database data.

此外,通过步骤S25,对所述补充地址数据进行空间聚类,得到聚类中心坐标及对应的地址描述文本,作为所述地址库的补充地址数据,能够进一步提高所述地址库的补充地址数据的精准度。In addition, through step S25, spatial clustering is performed on the supplementary address data to obtain the coordinates of the cluster center and the corresponding address description text, which are used as supplementary address data of the address database, which can further improve the supplementary address data of the address database. accuracy.

以上对本发明实施例的地址库补充方法结合具体应用示例进行了详细介绍,本发明实施例还提供了对应的地址库补充系统,为使本领域技术人员更好地理解和实施,以下参照附图并结合具体应用示例进行对应描述。The address library supplement method according to the embodiment of the present invention has been described in detail above in conjunction with specific application examples. The embodiment of the present invention also provides a corresponding address library supplement system. Corresponding descriptions are given in conjunction with specific application examples.

首先,参照图4所示的地址库补充系统的结构示意图,在本发明一些实施例中,如图4所示,地址库补充系统40可以包括:第一数据获取单元41、第二数据获取单元42和第三数据获取单元43、补充数据生成单元44和输出单元45,其中:First, with reference to the schematic structural diagram of the address database complementing system shown in FIG. 4 , in some embodiments of the present invention, as shown in FIG. 4 , the address database complementing system 40 may include: a first data acquisition unit 41 and a second data acquisition unit 42 and third data acquisition unit 43, supplementary data generation unit 44 and output unit 45, wherein:

所述第一数据获取单元41,适于获取地址众包采集数据;The first data acquisition unit 41 is adapted to acquire address crowdsourcing collection data;

所述第二数据获取单元42,适于获取地图兴趣点数据;The second data obtaining unit 42 is adapted to obtain map point of interest data;

所述第三数据获取单元43,适于获取地址库数据;The third data acquisition unit 43 is adapted to acquire address database data;

所述补充数据生成单元44,适于结合所述地址库数据和所述地图兴趣点数据,获取所述地址众包采集数据中不存在于所述地址库中的地址数据,作为补充地址数据;The supplementary data generating unit 44 is adapted to combine the address database data and the map POI data to obtain address data that does not exist in the address database in the address crowdsourcing collection data, as supplementary address data;

所述输出单元45,适于将所述补充地址数据输出至所述地址库。The output unit 45 is adapted to output the supplementary address data to the address library.

采用上述地址库补充系统40,通过所述地图兴趣点数据,能够对所述地址众包采集数据进行交叉验证,由于地址众包采集数据的数据量庞大,而地图兴趣点数据精准,因此能够高效而准确地对地址库进行补充,整个过程可以自动完成,无须人工到实地考察验证,因而能够极大地降低地址库的扩充成本,提高地址库更新速度,并有较高的准确率。Using the above-mentioned address database complementing system 40, through the map POI data, it is possible to perform cross-validation on the address crowdsourcing collection data. Since the data volume of the address crowdsourcing collection data is huge, and the map POI data is accurate, it can efficiently To supplement the address database accurately, the whole process can be completed automatically, without the need for manual on-site inspection and verification, which can greatly reduce the expansion cost of the address database, improve the update speed of the address database, and have a high accuracy rate.

在具体实施中,继续参照图4,所述地址库补充系统还可以包括:预处理单元46,适于对所述地址众包采集数据进行预处理,得到格式统一信息完整的初始地址数据,以用于获取所述补充地址数据。In a specific implementation, with continued reference to FIG. 4 , the address database supplementation system may further include: a preprocessing unit 46 , adapted to preprocess the address crowdsourcing collection data to obtain initial address data with complete unified format information, so as to for obtaining the supplemental address data.

在具体实施中,继续参照图4,所述地址库补充系统还可以包括:空间聚类单元47,适于对所述补充地址数据进行空间聚类,得到聚类中心坐标及对应的地址描述文本,作为输出至所述地址库的补充地址数据。In a specific implementation, with continued reference to FIG. 4 , the address library supplementation system may further include: a spatial clustering unit 47, adapted to perform spatial clustering on the supplementary address data to obtain cluster center coordinates and corresponding address description texts , as supplementary address data output to the address library.

地址库补充系统各功能单元的具体实现、工作原理及优点等均可以参见前述方法实施例中的详细介绍,这里不再详细展开描述。For the specific implementation, working principle, and advantages of each functional unit of the address database supplement system, reference may be made to the detailed introduction in the foregoing method embodiments, which will not be described in detail here.

参照图5所示的电子设备的结构示意图,本说明书实施例还提供了一种电子设备50,包括存储器51和处理器52,所述存储器51上存储有可在所述处理器52上运行的计算机程序,其中,所述处理器52运行所述计算机程序时可以执行前述任一实施例所述的地址库补充方法的步骤。具体步骤可以参见前述方法实施例的详细描述。Referring to the schematic structural diagram of the electronic device shown in FIG. 5 , an embodiment of the present specification further provides an electronic device 50 , which includes a memory 51 and a processor 52 , and the memory 51 stores a memory 51 that can run on the processor 52 . A computer program, wherein when the processor 52 runs the computer program, the steps of the address library supplement method described in any of the foregoing embodiments can be executed. For specific steps, reference may be made to the detailed description of the foregoing method embodiments.

其中,所述电子设备50可以为本地可以进行数据处理的计算机设备,如个人计算机、平板电脑等,也可以为计算机集群,或者为云端服务器,所述存储器51可以为分布式存储装置,除了存储所述计算机程序外,还可以存储所述地址众包采集数据、地址库数据和地图兴趣点数据等其中一种或多种。The electronic device 50 may be a computer device that can process data locally, such as a personal computer, a tablet computer, etc., or a computer cluster, or a cloud server, and the memory 51 may be a distributed storage device, except for storage In addition to the computer program, one or more of the address crowdsourcing collection data, address database data, and map interest point data may also be stored.

所述处理器52可以为单核处理器,也可以为多核处理器,可以为通用的处理器,也可以为能够进行数据量较大处理的专用处理器,这里对处理器的具体构造及实现方式并不做任何限定。The processor 52 may be a single-core processor, a multi-core processor, a general-purpose processor, or a special-purpose processor capable of processing a large amount of data. The specific structure and implementation of the processor are described here. The method is not limited in any way.

此外,在具体实施中,继续参照图5,所述电子设备50还可以包括显示器53,适于输出展示地址补充数据对应的地图,也可以对中间执行过程进行输出显示,例如,显示与地址库数据和地图兴趣点数据的交叉匹配结果。In addition, in a specific implementation, referring to FIG. 5 , the electronic device 50 may further include a display 53, which is suitable for outputting a map corresponding to the supplementary address data, and may also output and display the intermediate execution process, for example, display and address library Cross-match results of data and map POI data.

在具体实施中,继续参照图5,所述电子设备还可以包括输入接口54,通过输入接口54与用户交互,以供用户选择众包地址采集数据,地图兴趣点数据等,对中间数据处理结果进行筛选,以及进行一些必要的或个性化的设置等。In a specific implementation, with continued reference to FIG. 5 , the electronic device may further include an input interface 54, through which the user interacts with the user, so that the user can select crowdsourced address collection data, map interest point data, etc., and process the results of the intermediate data. Filter, and make some necessary or personalized settings, etc.

在另一些实施例中,可以通过通讯接口55获取所述获取众包地址采集数据,地图兴趣点数据等。In other embodiments, the acquisition crowdsourcing address collection data, map interest point data, etc. may be acquired through the communication interface 55 .

在具体实施中,存储器51、处理器52、显示器53、输入接口54及通讯接口55之间可以通过总线56进行通信。In a specific implementation, the memory 51 , the processor 52 , the display 53 , the input interface 54 and the communication interface 55 can communicate through the bus 56 .

本说明书实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序运行时执行前述任一实施例所述的方法的步骤,具体步骤可以参见前述实施例,此处不再赘述。The embodiments of the present specification further provide a computer-readable storage medium on which a computer program is stored, wherein, when the computer program runs, the steps of the methods described in any of the foregoing embodiments may be performed. For specific steps, reference may be made to the foregoing embodiments. , and will not be repeated here.

在具体实施中,所述计算机可读存储介质可以是光盘、机械硬盘、固态硬盘等各种适当的可读存储介质。In a specific implementation, the computer-readable storage medium may be various suitable readable storage mediums such as an optical disc, a mechanical hard disk, and a solid-state hard disk.

虽然本说明书实施例披露如上,但本发明并非限定于此。任何本领域技术人员,在不脱离本发明的精神和范围内,均可作各种更动与修改,因此本发明的保护范围应当以权利要求所限定的范围为准。Although the embodiments of the present specification are disclosed as above, the present invention is not limited thereto. Any person skilled in the art can make various changes and modifications without departing from the spirit and scope of the present invention. Therefore, the protection scope of the present invention should be based on the scope defined by the claims.

Claims (13)

1. An address bank supplementing method, comprising:
acquiring address crowdsourcing acquisition data;
acquiring address database data and map interest point data;
combining the address database data and the map interest point data to obtain address data which does not exist in the address database in the address crowdsourcing acquisition data so as to obtain supplementary address data;
outputting the supplemental address data to the address repository.
2. The method of claim 1, wherein prior to obtaining address data in the address crowd-sourced collection data that is not present in the address repository, further comprising:
preprocessing the address crowdsourcing acquisition data to obtain initial address data with a uniform format and complete information, so as to obtain the supplementary address data.
3. The method of claim 2, wherein the preprocessing the address crowdsourcing acquisition data to obtain initial address data with complete format unification information comprises:
performing block segmentation on an address text in the address crowdsourcing acquisition data, identifying an address attribute type, and supplementing administrative division information to obtain initial address data, wherein the initial address data comprises: coordinate data and corresponding address text data, the address text data comprising: address attribute type and administrative division information.
4. The method of claim 3, wherein the obtaining, in combination with the address library data and the map point of interest data, address data that is not present in the address library in the address crowd-sourced collection data to obtain supplemental address data comprises:
carrying out coordinate matching on the initial address data and the map interest point data, and keeping the initial address data which is matched with the coordinates and corresponds to the interest point block with a preset address attribute type as candidate address data;
and removing repeated data and noise data in the candidate address data by combining the address database data and the map interest point data to obtain the supplementary address data.
5. The method of claim 4, wherein the coordinate matching the initial address data with the map point of interest data, and the retaining initial address data of the point of interest block with coordinate matching and a preset address attribute type as candidate address data comprises:
inquiring an interest point block matched with the coordinates in the map interest point data according to the coordinate data in the initial address data;
and carrying out coordinate matching on the interest point subblocks with the building attribute types contained in the interest point block and the coordinate data, and keeping initial address data of the coordinate matching as the candidate address data.
6. The method of claim 4, wherein said culling of duplicate data and noise data from said candidate address data to obtain said supplemental address data in combination with said address repository data and said map point of interest data comprises at least one of:
matching the candidate address data with the interest point information in the map interest point data by combining address database data and the interest point information in the map interest point data, determining that the candidate address data belongs to a repeated address, or the address attribute type belongs to a newly added building, or the candidate address data belongs to the classification of a newly added mounting address, and then rejecting initial address data with repeated addresses;
and according to the interest point information in the map interest point data, performing rough clustering on the candidate address data, and removing noise data in the candidate address data to obtain the supplementary address data.
7. The method of any of claims 1-6, wherein prior to outputting the supplemental address data to the address bank, further comprising:
and carrying out spatial clustering on the supplementary address data to obtain a clustering center coordinate and a corresponding address description text which are used as supplementary address data output to the address library.
8. The method of claim 7, wherein the spatially clustering the supplemental address data to obtain cluster center coordinates and corresponding address description text comprises:
clustering data describing the same place by using an anti-noise clustering algorithm to obtain a clustering center coordinate;
and clustering and supplementing the address text data describing the same place in the supplemented address data to obtain a corresponding address description text.
9. An address repository replenishment system, comprising:
the first data acquisition unit is suitable for acquiring the address crowdsourcing acquisition data;
the second data acquisition unit is suitable for acquiring the point data of interest of the map;
the supplementary data generating unit is suitable for acquiring address data which is not in the address database in the address crowdsourcing acquisition data by combining the map interest point data, and the address data is used as supplementary address data;
an output unit adapted to output the supplemental address data to the address bank.
10. The system of claim 9, further comprising: and the preprocessing unit is suitable for preprocessing the address crowdsourcing acquisition data to obtain initial address data with complete format unified information so as to obtain the supplementary address data.
11. The system of claim 9 or 10, further comprising: and the spatial clustering unit is suitable for performing spatial clustering on the supplementary address data to obtain a clustering center coordinate and a corresponding address description text as supplementary address data output to the address library.
12. An electronic device comprising a memory and a processor, the memory having stored thereon a computer program being executable on the processor, wherein the processor, when executing the computer program, performs the steps of the method of any of claims 1 to 8.
13. A computer-readable storage medium, on which a computer program is stored, wherein the computer program performs the steps of the method of any one of claims 1 to 8 when executed.
CN202210654565.6A 2022-06-10 2022-06-10 Address library supplementing method and system, electronic equipment and readable storage medium Pending CN115080847A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210654565.6A CN115080847A (en) 2022-06-10 2022-06-10 Address library supplementing method and system, electronic equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210654565.6A CN115080847A (en) 2022-06-10 2022-06-10 Address library supplementing method and system, electronic equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN115080847A true CN115080847A (en) 2022-09-20

Family

ID=83250339

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210654565.6A Pending CN115080847A (en) 2022-06-10 2022-06-10 Address library supplementing method and system, electronic equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN115080847A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8880535B1 (en) * 2011-11-29 2014-11-04 Google Inc. System and method for selecting user generated content related to a point of interest
CN104572954A (en) * 2014-12-29 2015-04-29 北京奇虎科技有限公司 System and method for verifying map interest point information by mail delivery
CN107656913A (en) * 2017-09-30 2018-02-02 百度在线网络技术(北京)有限公司 Map point of interest address extraction method, apparatus, server and storage medium
CN114036414A (en) * 2021-11-10 2022-02-11 北京百度网讯科技有限公司 Method, apparatus, electronic device, medium and program product for processing point of interest

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8880535B1 (en) * 2011-11-29 2014-11-04 Google Inc. System and method for selecting user generated content related to a point of interest
CN104572954A (en) * 2014-12-29 2015-04-29 北京奇虎科技有限公司 System and method for verifying map interest point information by mail delivery
CN107656913A (en) * 2017-09-30 2018-02-02 百度在线网络技术(北京)有限公司 Map point of interest address extraction method, apparatus, server and storage medium
CN114036414A (en) * 2021-11-10 2022-02-11 北京百度网讯科技有限公司 Method, apparatus, electronic device, medium and program product for processing point of interest

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HUIQI HU ET AL.: "Crowdsourced POI labelling Location-aware result inference and Task Assignment", 《2016 IEEE 32ND INTERNATIONAL CONFERENCE ON DATA ENGINEERING》, 23 June 2016 (2016-06-23), pages 61 - 72 *
唐炉亮: "大数据环境下道路场景高时空分辨率众包感知方法", 《测绘学报》, vol. 51, no. 6, 25 March 2022 (2022-03-25), pages 1070 - 1090 *

Similar Documents

Publication Publication Date Title
CN111563103B (en) Method and system for detecting data blood relationship
US20170010123A1 (en) Hybrid road network and grid based spatial-temporal indexing under missing road links
CN112860993B (en) Method, device, equipment, storage medium and program product for classifying points of interest
CN106484915B (en) A method and system for cleaning massive data
CN112015937B (en) Picture geographic positioning method and system
CN114418120B (en) Data processing method, device, equipment and storage medium of federated tree model
CN113688299B (en) Land plot selection method, device, electronic device and storage medium
CN113658338B (en) Point cloud tree monomer segmentation method, device, electronic device and storage medium
CN114690786A (en) A path planning method and device for a mobile machine
CN110377776B (en) Method and device for generating point cloud data
CN118426581A (en) Meta universe data processing method and system
CN115080847A (en) Address library supplementing method and system, electronic equipment and readable storage medium
CN111107493A (en) A mobile user location prediction method and system
CN116912817A (en) Three-dimensional scene model splitting method and device, electronic equipment and storage medium
CN116050159A (en) A simulation scene set generation method, device, equipment and medium
CN111369624B (en) Positioning method and device
CN117935278A (en) Method and device for generating interest point data, related equipment and storage medium
CN110457705B (en) Method, device, equipment and storage medium for processing point of interest data
CN115905588A (en) Three-dimensional measurement data processing method and device, electronic device and storage medium
CN115460095A (en) Network space link mapping method, system, storage medium and equipment
CN110413662B (en) Multichannel economic data input system, acquisition system and method
CN114925680A (en) Logistics interest point information generation method, device, equipment and computer readable medium
CN114036414A (en) Method, apparatus, electronic device, medium and program product for processing point of interest
CN114429801A (en) Data processing method, training method, identification method, apparatus, equipment and medium
CN113656425A (en) Method, device, electronic device, storage medium and product for updating electronic map

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20220920